The  Second  International 
Conference  on 
Intelligent  Processing 
and  Manufacturing 
of  Materials 


Volume  1 


If? 

Pi  iii®  ft 

'  mmm 

IwWfpS 
<  m  ill  i 

limp? 
itims 

I  PI  TO  3$ 

sgjdSSS 
SgBgiiS 

aaSSgg 


F^totoSU 

gasSSS 


DISTRIBUTION  STATEMENT  A 
Approved  for  Public  Release 
Distribution  Unlimited 


20000627 


REPORT  DOCUMENTATION  PAGE 


Form  Approved 
OMB  NO.  0704-0188 


Public  Reporting  burden  for  this  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources, 
gathering  and  maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comment  regarding  this  burden  estimates  or  any  other  aspect  of  this  collection 
of  information,  including  suggestions  for  reducing  this  burden,  to  Washington  Headquarters  Services,  Directorate  for  information  Operations  and  Reports,  1215  Jefferson  Davis  Highway, 

Suite  1204  Arlington  VA  22202-4302,  and  to  the  Office  of  Management  and  Budget,  Paperwork  Reduction  Project  (0704-0188,)  Washington,  DC  20503. _ 

1 .  AGENCY  USE  ONLY  (  Leave  Blank)  2.  REPORT  DATE  -  3.  REPORT  TYPE  AND  DATES  COVERED 

June  2000  Final  Report 


4.  TITLE  AND  SUBTITLE 

IPMM'99  The  Second  International  Conference  on  Intelligent  Processing  and 
Manufacturing  of  Materials,  VOLUME  1  AND  VOLUME  2 

6.  AUTHOR(S) 

John  A.  Meech,  principal  investigator 

7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

University  of  British  Columbia 
Vancouver,  BC,  V6T-1Z4 


9.  SPONSORING  /  MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 

U.  S.  Army  Research  Office 
P.O.  Box  12211 

Research  Triangle  Park,  NC  27709-2211 


5.  FUNDING  NUMBERS 


DAAD29-99-1  -0074 


8.  PERFORMING  ORGANIZATION 
REPORT  NUMBER 


10.  SPONSORING  /  MONITORING 
AGENCY  REPORT  NUMBER 

ARO  39677. 1-RT-CF 


11.  SUPPLEMENTARY  NOTES 

The  views,  opinions  and/or  findings  contained  in  this  report  are  those  of  the  author(s)  and  should  not  be  construed  as  an  official 
Department  of  the  Army  position,  policy  or  decision,  unless  so  designated  by  other  documentation. 


12  a.  DISTRIBUTION  /  AVAILABILITY  STATEMENT  12  b.  DISTRIBUTION  CODE 

Approved  for  public  release;  distribution  unlimited. 

13.  ABSTRACT  (Maximum  200  words) 

The  second  International  Conference  on  Intelligent  Processing  and  Manufacturing  of  Materials  was  held  in  Honolulu,  Hawaii  on 
July  10-15,  1999.  IPMM'99  is  the  second  in  a  series  of  conferences  dealing  with  the  application  of  Artificial  Intelligence  and  related 
technologies  such  as  expert  systems,  fuzzy  logic,  artificial  neural  networks,  genetic  algorithsm,  pattern  recognition  and  hybrid  systems 
to  the  processing  and  manufacturing  of  materials  and  products. 

The  theme  of  this  year's  conference  is 

"Intelligence  in  Materials  Production  -  The  Competitive  Edge!" 


14.  SUBJECT  TERMS 


15.  NUMBER  OF  PAGES 


16.  PRICE  CODE 


Proceedings 


of  the  Second  International  Conference 


on 

Intelligent  Processing  and  Manufacturing  of  Materials 


IPMM’99 


Volume  1 


Editors: 

John  A.  Meech,  Marcello  M.  Veiga, 
Michael  H.  Smith,  Steven  R.  LeClair 


Hilton  Hawaiian  Village  Hotel 
Honolulu,  Hawaii 

July  10  -  15, 1999 

1K7.0  quality  mmwfflm  % 


IPMM’99 


Foreword 

It  is  a  great  pleasure  to  welcome  you  to  Hawaii  and  to  the  Second  International  Conference  on  Intelligent 
Processing  and  Manufacturing  of  Materials. 

The  theme  of  this  year's  conference  is 

"Intelligence  in  Materials  Production  -  the  Competitive  Edge!" 

"We  are  living  in  a  Material  World"  sings  Madonna  and  throughout  the  ages,  materials  have  been  essential 
for  bettering  our  standard  of  living.  All  materials  derive  from  the  Earth's  crust,  oceans  or  atmosphere  and 
soon,  even  from  outer  space.  By  applying  human  intelligence  to  the  properties  of  matter  and  the 
environment  of  a  problem,  Mankind  has  developed  countless  materials,  goods  and  products  to  serve 
Society's  needs.  Perhaps  Madonna's  song  should  refer  to  an  "Intelligent  World". 

IPMM'99  is  the  second  in  a  series  of  conferences  dealing  with  the  application  of  Artificial  Intelligence  and 
related  technologies  such  as  expert  systems,  fuzzy  logic,  artificial  neural  networks,  genetic  algorithms, 
pattern  recognition  and  hybrid  systems  to  the  processing  and  manufacturing  of  materials  and  products.  The 
lstIPMM  Conference  was  held  in  1997  in  Gold  Coast,  Australia  and  attracted  over  300  delegates  from  37 
countries  with  a  diverse  set  of  backgrounds  that  included  computing,  mining,  metals,  materials, 
manufacturing,  etc.  The  participants  found  much  to  share  in  the  "intelligent"  methods  being  used  around 
the  world  to  study,  simulate,  process  or  make  materials  and  products.  The  cross-disciplinary  nature  of  this 
conference  series  is  a  "breath  of  fresh  air"  to  many  of  us. 

In  the  production  of  ores,  minerals,  metals,  ceramics,  plastics  or  food,  intelligent  methods  have  become 
essential  to  better  understand  and  process  materials  or  to  manufacture  products.  Intelligence  is  embodied  in 
creative  ways  to  select  components,  predict  properties,  control  processes  or  operate  plants  and  factories. 
Such  methods  may  be  software  or  hardware  applications;  they  may  mimic  how  the  human  mind  processes 
information;  or  they  may  derive  from  first-principle  modeling  of  the  physics  and  chemistry  of  matter. 

Corporations  are  increasingly  turning  to  intelligent  methods  to  enhance  their  competitiveness  in  today's 
complex  society  and  so,  the  technical  program  at  IPMM'99  is  focused  on  research  aimed  at  leading-edge 
industrial  applications  and  on  the  identification  of  newly-evolving  technologies. 

Intelligence  exists  all  around  us.  Each  of  us  uses  it  to  conduct  our  daily  lives.  As  the  world  becomes 
increasingly  more  complex  and  as  communication  systems  allow  massive  transfer  of  information  at 
incredibly  reduced  time  scales,  the  global  community  will  begin  to  apply  this  rapid  collection  of  knowledge 
through  powerful  massive ly-paral lei  systems  that  currently  exist  within  our  families,  communities,  towns 
and  cities,  states  and  countries.  As  computers  become  more  and  more  predominant  in  our  workplaces  and 
homes,  we  will  begin  to  consider  problems  to  which  previously,  we  could  only  apply  our  imaginations. 

Intelligence  that  exists  in  humans  and  other  species,  is  now  being  placed  into  machines  and  materials.  We 
are  applying  intelligence  as  we  explore  outer  space  and  yes,  perhaps,  one  day  we  will  discover  new 
intelligent  life  forms  in  the  universe. 

Conventional  approaches  to  problem  solving  are  becoming  more  and  more  integrated  into  systems  that  are 
controlled  using  fuzzy  logic,  artificial  neural  networks,  genetic  algorithms  to  create  hybrid  systems.  As 
these  systems  become  more  widely  used  in  industry,  the  complexity  issues  will  grow  as  we  attempt  to  find 
"optimum"  solutions  to  our  problems.  You  will  find  many  papers  at  IPMM'99  dealing  with  hybrid  systems 
that  combine  the  attributes  of  many  different  methodologies. 


11 


The  methodologies  may  mimic  the  human  thought-process  either  symbolically  or  structurally.  Papers  are 
available  describing  evolutionary  techniques  that  adapt  to  changing  circumstances  and  allow  solutions  to 
problems  to  adjust  in  response  to  external  factors.  A  number  of  papers  focus  on  developing  instruments  that 
provide  artificial  senses  that  mimic  the  eye,  the  nose,  the  ears  and  yes,  even  the  tongue.  Tactile  activities 
are  also  important  in  robotic  fields  and  so  even,  the  sense  of  touch  is  described  in  some  papers. 

As  we  examine  these  proceedings  and  its  many  fascinating  areas  of  research,  I  wish  to  issue  a  few 
challenges  that  we  face  in  developing  new  products  to  assist  us  in  our  future  lives.  Some  of  these  ideas 
came  to  mind  from  reading  the  papers  and  still  others  developed  from  the  difficult  exercise  of  putting 
together  this  conference  and  proceedings. 

Challenge  1 :  Can  we  find  a  way  to  put  a  film  onto  the  surface  of  eye-glasses  that  changes  its  refractive 
index  in  response  to  external  light  and/or  the  distance  at  which  the  wearer  is  focusing? 
Perhaps,  the  film  would  have  a  variable  R.I.  from  top  to  bottom  of  the  lens. 

Challenge  2:  Can  we  develop  hearing  aids  that  actually  work  properly  —  which  filter  out  extraneous  noise 
and  provide  quality  hearing  to  those  of  us  impaired? 

Challenge  3:  Can  we  develop  a  word  processing  program  which  always  prints  out  documents  the  way  they 
were  originally  designed  regardless  of  the  print  driver  and  hardware  being  used? 

The  first  two  challenges  can  revolutionize  the  field  of  hearing  and  sight  aids  can  improve  the  quality  of  life 
for  many,  many  people.  The  third  challenge  probably  exists  already  but  is  not  being  marketed  in  a  way  to 
be  of  widespread  use.  Much  time  and  effort  must  be  spent  by  those  of  who  use  word  processors  everyday  to 
reformat  documents  as  we  move  around  our  offices  or  as  we  move  from  home  to  office. 

Of  course,  these  challenges  are  trivial  compared  to  some  of  the  more  fundamental  (environmental,  political 
and  social)  issues  facing  the  world  today.  But  it  is  even  such  small  problems  being  solved  that  can  have 
enormous  impact  on  so  many  people.  The  opportunity  to  apply  intelligence  exists  in  everything  we  do  or 
make.  It  is  up  to  those  of  us  involved  in  the  field  to  see  that  the  intelligent  methods  are  applied  for  the  good 
of  Mankind. 

There  are  many  people  who  contributed  to  the  success  of  this  conference.  For  their  advice  and  patience,  I 
would  like  to  thank  the  following  individuals:  Marcello  Veiga,  Mike  Smith,  Steve  LeClair,  Tom  Zacharia, 
Guy  Nicoletti,  Ed  Szczerbicki,  Madjid  Fathi,  Malcolm  Scoble,  Tara  Chandra,  Lotfi  Zadeh,  Zoran 
Bugarinovic,  Robert  Wagoner,  Junichi  Endou,  Susuma  Shima,  Debbie  McCoy,  Iqbal  Ahmad,  John 
Atkinson,  Scott  Meech,  Sonia  Veiga,  Bojan  Bugarinovic,  Igor  Bugarinovic.  Special  Mahalo  (thanks)  to 
Epooni  Perkins,  Stan  Omizo  and  Lisa  Chang  of  the  Hilton  Hawaiian  Village  for  their  support  and  patience. 

We  trust  you  will  find  these  proceedings  of  great  benefit  in  your  future  endeavors  and  research. 


John  A.  Meech 
General  Chair,  IPMM'99 


Honolulu,  Hawaii,  USA 
May  31,  1999. 


Ill 


IPMM'99  Organizing  Committee 


Honorary  Chair: 

Lotfi  Zadeh,  University  of  California,  Berkeley,  CA,  USA 
General  Chair: 

John  A.  Meech,  University  of  British  Columbia,  Vancouver,  BC,  Canada 
Program  Co-Chairs: 

Michael  H.  Smith,  University  of  California,  Berkeley,  CA,  USA 
Marcello  M.  Veiga,  University  of  British  Columbia,  Vancouver,  BC,  Canada 


Vice-Chairs: 

N.  America:  Thomas  Zacharia,  Oak  Ridge  National  Laboratory,  TN,  USA 

Central  /S.  America:  Marcello  Veiga,  University  of  British  Columbia,  Canada 

Europe:  Madjid  Fathi,  University  of  Dortmund,  Germany 

Australia:  Edward  Szczerbicki,  University  of  Newcastle,  Australia 

Japan:  Susuma  Shima,  Kyoto  University,  Japan 

Asia:  Tara  Chandra,  University  of  Wollongong,  Australia 

Special  Sessions:  Steven  R.  LeClair,  Wright-Patterson  Air  Force  Base,  OH,  US 


Workshop  Chair: 

Guy  M.  Nicoletti,  University  of  Pittsburgh,  Greensburg,  PA,  USA 
Exhibition  Chair: 

Debbie  McCoy,  Oak  Ridge  National  Laboratory,  Tennessee,  USA 
Program  Committee: 

I.  Ahmad,  USA  T.  Aizawa,  Japan 

H.  Asanuma,  Japan  J.F.  Atkinson,  USA 


M.B.  Balachandran,  Austr: 
C.J.  Davies,  Australia 
M.A.  Elbestawi,  Canada 

W.  Gruver,  Canada 

H.  Henein,  Canada 

C. G.  Kang,  Korea 
Y.  Kawazoe,  Japan 
J.  Leopold,  Germany 

W. J.  Lui,  Canada 
J.P.  McGeer,  Canada 

X.  Nui,  Singapore 
J.C.  Paschoal,  Brasil 
J.  Pieper,  Canada 
M.  Scoble,  Canada 

Y.  Takefuji,  Japan 
J.S.L.  van  Deventer,  Aust. 
R.  Villas  Boas,  Brasil 

D.  Yuen,  Australia 


J.H.  Beynon,  U.K. 

S.  Dolinsek,  Slovenia 
J.  Endou,  Japan 

J. S.  Gunasekera,  USA 
H.  Jalkanen,  Finland 
R.  Kaspar,  Germany 

A.  Kusiak,  USA 
D.A.  Linkens,  U.K. 

K.  Manabe,  Japan 

B.  Mehta,  USA 

T.  Ono,  Japan 

W.  Pedrycz,  Canada 
Y.  Saito,  Japan 
J.  Sestak,  Czech  Republic 
Y.  Tamera,  Japan 
T.  Van  Le,  Australia 
R.H.  Wagoner,  USA 
D.  Daniel  Zhu,  USA 


A. E.  Araujo,  Brasil 

D.  Barschdorff,  Germany 

L.  Cser,  Finland 

M. A.  Duarte-Mermoud,  Chile 
M.  Geiger,  Germany 

R.  Guthrie,  Canada 
J.J.  Jonas,  Canada 
J.  Keller,  USA 
Y.C.  Lam,  Australia 
R.S.  Liu,  P.R.  China 

B.  Marett,  Australia 

L.  Monostori,  Hungary 
P.H.  Osanna,  Austria 
P.  Peussa,  Finland 
T.  Sakai,  Japan 

G. W.  Shuy,  Taiwan 

I. B.  Turksen,  Canada 
B.  Verma,  Australia 

J.  Yen,  USA 

H. J.  Zimmerman,  Germany 


IV 


IPMM’99 

Affiliated  Organizations  and  Sponsors 


Sponsors 

The  Conference  is  supported  financially  by 

The  Department  of  the  Army1,  US  Army  Research  Office,  NC 
The  US  Air  Force,  Wright-Patterson  AF-Base,  Dayton,  OH;  and 
Oak  Ridge  National  Laboratory,  Oak  Ridge,  TN 


Affiliations 

The  Organizing  Committee  is  grateful  for  the  affiliation  with  and  cooperation  received  from  the  following 
organizations: 


Society  for  the  Advancement  of  Material  &  Process  Engineering  (SAMPE) 
The  Metallurgical  Society  (TMS)  of  AIME 
Japan  Society  for  Technology  of  Plasticity  (JSTP) 

Australasia  Institute  of  Metals  and  Materials 
Institute  of  Mining  and  Metallurgy 

Canadian  Institute  of  Mining,  Metallurgy  and  Petroleum  (CIM) 

North  American  Fuzzy  Information  Processing  Society 
IEEE  Systems,  Man  and  Cybernetics  (SMC)  Society 


1  The  views,  opinions,  and/or  findings  contained  in  this  report  are  those  of  the  author(s)  and  should  not  be  construed  as 
an  official  Department  of  the  Army  position,  policy,  or  decision,  unless  so  designated  by  other  documentation. 


V 


About  IPMM 


Intelligent  Processing  and  Manufacturing  of  Materials  is  an  informal  international  community  of  people 
interested  in  intelligent  software  and  hardware  applications  and  solutions  to  problems  that  exist  in  the 
creation  and  manufacture  of  minerals,  metals,  materials  and  products.  IPMM  holds  an  international 
conference  every  two  years  in  July  and  sponsors  a  web  site  for  the  dissemination  of  relevant  information  to 
workers  and  researchers  in  the  field. 

The  first  Forum  on  IPMM  took  place  in  the  Gold  Coast  in  Australia  in  July  1997.  The  technical  program 
consisted  of  240  papers  on  a  wide  range  of  topics  that  included  fuzzy  systems,  artificial  neural  networks, 
genetic  algorithms,  first-principle  modeling,  finite-element-analysis  and  thermodynamics.  Most  papers 
focused  on  applications  to  real-world  problems  as  opposed  to  theoretical  analyses.  Over  300  delegates 
attended  from  35  countries  and  the  success  of  the  conference  lead  to  a  decision  to  organize  a  follow-up 
conference  in  1999. 

Hawaii  was  chosen  as  the  venue  for  IPMM'99  and  the  planning  of  the  event  began  in  late  August  1997. 
Over  the  past  two  years  a  numbers  of  world  events  have  created  difficulties  in  sustaining  interest  in  IPMM. 
These  include  the  1998  Asian  economic  crisis  and  the  late  scheduling  of  several  major,  related  conferences 
in  other  parts  of  the  world  at  the  same  time  as  our  event. 

Despite  these  difficulties,  IPMM'99  has  been  successful  in  attracting  over  200  papers  and  about  230 
delegates.  The  list  of  plenary  speakers  contains  some  very  important  people  in  the  field  of  soft  computing 
and  intelligent  methods  and  we  are  proud  to  have  their  participation.  Both  financial  and  logistic  support 
from  organizations  such  as  the  US  Army  Research  Office,  Wright-Patterson  Air  Force  Base,  Oak  Ridge 
National  Laboratory  and  IEEE-SMC  has  been  strong  and  is  greatly  appreciated.  Our  Organizing  Committee 
has  worked  tirelessly  to  recruit  papers  and  promote  the  conference  and  we  acknowledge  their  efforts. 

This  year,  two  Awards  will  be  made  in  memory  of  two  departed  colleagues  who  contributed  in  many  ways 
to  the  startup  of  IPMM  —  Dr.  J.  Keith  Brimacombe  of  the  University  of  British  Columbia  and  Dr.  Iqbal 
Ahmad  of  the  US  Army  Research  Office  in  North  Carolina.  The  award  recipients  were  chosen  by  a  select 
committee  based  on  their  contributions  to  IPMM'99.  These  awards  are: 

The  J.  Keith  Brimacombe  Award  for 
Excellence  in  Cross-Disciplinary  Research  in 
Intelligent  Processing  and  Manufacturing  of  Materials 

The  Iqbal  Ahmad  Award  for  the  Best  Student  Paper  in  IPMM 

We  are  considering  a  third  conference.  IPMM-2001  is  being  discussed  as  a  possibility  to  be  held  in 
Vancouver,  Canada,  in  Cupertino,  California  or  in  Europe.  Meetings  are  scheduled  at  IPMM'99  to 
formalize  our  plans  and  to  make  final  decisions.  Keep  up  todate  about  future  IPMM  activities  by  visiting 
our  web  site  at  <  http://mining.ubc.ca/ipmm/  > 


VI 


IPMM'99 
Special  Events 


Saturday,  July  10, 1999 
Workshop  Tutorials 

TW2:  Design  of  Experiments  as  a  Precursor  to  Neural  Networks 
TW4:  Using  DynaFLeX  (DFX)  to  Create  Intelligent  Web  Sites 
TW6:  Fuzzy  Data  Mining  and  Expert  System  Development 
TW8:  Artificial  Immune  Systems:  a  New  Frontier  in  AI 

Sunday,  July  11, 1999 
Workshop  Tutorials 

TW1:  Fuzzy  Sets  and  Evolutionary  Strategies  in  Eng.  Design 

TW3:  Data  Mining  Using  Artificial  Neural  Networks 

TW5:  Fuzzy  Logic  and  Data  Mining  -  Methods  and  Applications 

TW7:  Imaging  Procedures  for  Particles  and  Particulate  Solids 

TW9:  Distributed  Intelligent  Control/Simulation  of  Large-Scale  Systems 

Organizing  Committee  Luncheon  Meeting 
Opening  Reception 

Monday,  July  12, 1999 
Author's  Breakfast 
Morning  Break 
Conference  Buffet  Lunch 
Afternoon  Break 

Tuesday,  July  13, 1999 
Author's  Breakfast 
Morning  Break 
Conference  Buffet  Lunch 
Afternoon  Break 
No-host  Reception 
Conference  Banquet 

Wednesday,  July  14, 1999 
Author's  Breakfast 
Morning  Break 
Free  Time 

Tour  of  the  Hawaiian  Electric  Co.'s  Electric  Vehicle  Project 
Additional  Workshops 

Thursday,  July  15, 1999 
Author's  Breakfast 
Morning  Break 
Conference  Buffet  Lunch 
Afternoon  Break 
Wrap-up  Panel  Discussion 


900  -  1230 
Cancelled 
1400-  1730 
1400-  1730 


900  -  1230 
Cancelled 
1400-  1730 
900  -  1230 
1400-  1730 

1230-  1330 
1800-  1930 


730  -  830 
1030-  1100 
1230-  1330 
1530-  1600 


730  -  830 
1030-  1100 
1230-  1330 
1530-  1600 
1900-  1930 
1930  -  2230 


730  -  830 
1030-  1050 
1230  -end  of  day 
1330-  1630 
1330-  1730 


730  -  830 
1010-1030 
1230-  1330 
1500-  1530 
1630-  1700 


Vll 


IPMM'99 

Companion's  Program 


Sunday,  July  11, 1999 

Opening  Reception 

Monday,  July  12, 1999 

Sea  Life  Park  (Splash-U  optional) 

Tuesday,  July  13, 1999 

Royal  Circle  Island  Tour  of  Oahu 
No-host  Reception 
Conference  Banquet 

Wednesday,  July  14, 1999 

Mai  Ta'i  Trade  Winds  Sail 

Hawaiian  Electric  Co.'s  Electric  Vehicle  Project  (alternate) 
Thursday,  July  15, 1999 

Oahu  Coastal  Cruise  into  Honolulu  and  Pearl  Harbor 


Additional  Individual  Tours  (not  included  in  Companion  Program) 


Monday,  July  12th 
Monday,  July  12lh 
Wednesday,  July  14th 
Wednesday,  July  14th 


Three-Star  Sunset  Dinner  Cruise 
Paradise  Cove  Lu'au  -  Royal  Ali’i 
Pacific  Splash  Barefoot  Fun  Cruise 
Magic  of  Polynesia  Dinner  Show 


Post-Conference  Tours  (one  day  sightseeing  trips  by  air  and  coach) 


Big  Island  of  Hawaii 

Maui 

Kawa'i 


1800-  1930 


830-  1530 


730-  1630 
1900-  1930 
1930-2230 


1130  -  1530 
1330-  1630 


745  -  1230 


1730-2030 
1615-2130 
1230-  1700 
1700-  1930 


500  -  2030 
530-  1930 
530  -  1900 


All  tours  arranged  by: 

La  Rena  Saul,  Sales  Manager, 

Classic  Destination  Management-Hawaii 
2255  Kuhio  Avenue,  Suite  1620, 
Honolulu,  Hawaii,  96815-2656 
Phone:  800-367-2333  ext.  226 
Phone:  808-971-2700  ext.  226 
FAX:  808-922-2606 
Email:  saul@dassichawaii.com 


Vlll 


IPMMf99 

The  Second  International  Conference  on 
Intelligent  Processing  and  Manufacturing  of  Materials 

Volume  1 
Contents 


Plenary  Presentations  1 

From  Computing  with  Numbers  to  Computing  with  Words:  3 

From  Manipulation  of  Measurements  to  Manipulation  of  Perceptions 
Lotfi  A.  Zadeh 

Berkeley  Initiative  in  Soft  Computing  (BISC) 

Computer  Science  Division  and  the  Electronics  Research  Laboratory, 

Department  of  Electrical  Engineering  and  Computer  Science, 

The  University  of  California,  Berkeley,  California,  USA 

Hybrid  Modeling  for  Testing  Intelligent  Software  for  Lunar-Mars  5 

Closed  Life  Support 
Jane  T.  Malin 

Intelligent  Systems  Branch,  Automation,  Robotics  and  Simulation  Division 
NASA  Johnson  Space  Center,  Houston,  Texas,  USA 

Image  Analysis  and  Vision  Systems  for  Processing  Plants  1 1 

Antti  J.  Niemi ,  Heikki  Hyotyniemi*,  and  Raimo  Ylinen** 

Helsinki  University  of  Technology,  Control  Engineering  Laboratory 
^P.O.  Box  5400,  FIN-02015  HUT,  Finland 
University  of  Oulu,  Systems  Engineering  Laboratory, 

P.O.  Box  4300,  FIN-9040 1  Oulu,  Finland 

Progress  in  Japan’s  Intelligent  Manufacturing  Systems  Research  Program  21 
Yuji  Furukawa 

Tokyo  Metropolitan  University,  Minami-Osawa,  Hachioji,  Tokyo,  Japan 

Analysis  of  Processes  and  Large  Data  Sets  by  a  Self-Organizing  Method  27 
Teuvo  Kohonen 

Helsinki  University  of  Technology,  Neural  Networks  Research  Centre, 

P.O.  Box  2200,  FIN-02015  HUT,  Finland 

Rough  Set  Theory  for  Intelligent  Industrial  Applications 
Zdzislaw  Pawlak 

Institute  of  Theoretical  and  Applied  Informatics, 

Polish  Academy  of  Sciences,  Poland 


37 


IX 


From  Fuzzy  Set  Theory  to  Computational  Intelligence  -  45 

Special  European  Experiences 
Hans-Juergen  Zimmermann 

Aachen  University  of  Technology, 

RWTH,  Institute  of  Operations  Research,  Aachen,  Germany 

Teleminingtm  Systems  Applied  to  Underground  Hard  Rock  53 

Metal  Mining  at  Inco  Limited 
Gregory  R.  Baiden 

INCO  Mines  Research,  Sudbury,  Ontario,  Canada 

Soft  Sensors  for  Processing  Plants  59 

Guillermo  D.  Gonzalez 

Department  of  Electrical  Engineering,  University  of  Chile,  Santiago,  Chile 


J.  Keith  Brimacombe  Memorial  Symposium: 

Intelligence  in  Materials  Engineering  71 

In  Memory  of  J.  Keith  Brimacombe:  73 

The  Pursuit  of  Quality  in  the  Casting  of  Materials 
Indira  V.  Samarasekera 

The  Centre  for  Metallurgical  Process  Engineering,  The  J.K.  Brimacombe  Advanced 
Materials  and  Process  Engineering  Laboratory  (AMPEL), 

The  University  of  British  Columbia,  Vancouver,  Canada 

Towards  Intelligent  Steel  Processing  75 

Rian  J.  Dippenaar 

BHP  Institute  for  Steel  Processing  and  Products, 

The  University  of  Wollongong,  Wollongong,  New  South  Wales,  Australia 

Computer  Simulation  and  Information  Management  Systems  85 

for  Material  Processing 
Yoshiyuki  Nagasaka 

Department  of  Distribution  Science,  Osaka  Sangyo  University,  Osaka,  Japan 

Simulation  of  Springback  with  the  Draw/Bend  Test  91 

Kaiping  Li,  Lumin  Geng,  Robert  H.  Wagoner 

Dep't.  Materials  Science  and  Engineering,  Ohio  State  University,  Columbus,  Ohio 

Development  of  an  Integrated  System  for  Designing  105 

Steelmaking  Aim  Compositions 

P.A.  Manohar*,  S.S.  Shivathaya**,  M.  Ferry*,  T.  Chandra* 

*  Dep't.  of  Materials  Engineering.,  University  of  Wollongong,  NSW,  Australia 
**  Hawker  de  Havilland  Ltd.,  Bankstown,  Australia 


X 


A  SCADA-based  Expert  System  to  Provide  Delay  Strategies  for  a  111 

Steel  Billet  Reheat  Furnace 

Clifford  Mui*,  John  A.  Meech**,  Peter  Barr** 

*  Dynapro  Systems  Inc.,  Vancouver,  B.C.,  Canada 
**  The  Centre  for  Metallurgical  Process  Engineering, 

The  University  of  British  Columbia,  Vancouver,  B.C.,  Canada 

Simulation  and  Analysis  of  Thin  Strip  Casting  Processes  1 1 9 

Yogeshwar  Sahai,  Manish  Gupta 

Dep't.  Materials  Science  and  Engineering,  Ohio  State  University,  Columbus,  Ohio 

Intelligent  Manufacturing  I  129 

Agent-Based  Control  of  Manufacturing  Systems  1 3 1 

Laszlo  Monostori,  B.  Kadar 

Computer  Automation  Institute,  Hungarian  Academy  of  Sciences, 

Budapest,  Hungary 

Intelligent  Database  Support  for  Manufacturing  and  Processing  of  139 

Industrial  Materials 

Sylvanus  A.  Ehikioya,  E.G.  Truelove,  Thomas  T.  Tran 

Brandon  University,  Brandon,  Manitoba,  Canada 

Intelligent  Production  Management  in  Mining  Systems  145 

Sean  Dessureault,  Malcolm  Scoble,  Scott  Dunbar 

Dep't.  of  Mining  and  Mineral  Process  Engineering, 

University  of  British  Columbia,  Vancouver,  B.C.,  Canada 

Intelligent  Quality  Control  for  the  Food  Industry  1 5 1 

using  a  Fuzzy-Fractal  Approach 
Oscar  Castillo,  Patricia  Melin 

Tijuana  Institute  of  Technology,  Chula  Vista,  California 

Design  Tool  for  Assessing  Manufacturing  Environments  157 

Daniel  A.  Holder*,  Raymond  D.  Harrell*,  Daniel  Rochoviak**, 

Phillip  Farrington**,  Dawn  Russell**,  John  Rogers**,  Sherri  Messimer** 

*  US  Army  AMCOM,  Redstone  Arsenal,  Alabama,  USA 
**  University  of  Alabama  in  Huntsville,  Alabama,  USA 

Models,  Algorithms  and  Decision  Support  Systems  for  Letter  Mail  Logistics  163 
Hans-Jiirgen  Sebastian 

RWTH  Aachen ,  Operations  Research  Group,  Aachen,  Germany 

Intelligent  Processes  for  Production  Control  165 

Edson  Pacheco  Paladini 

Universidade  Federal  de  Santa  Catarina,  Florianopolis,  SC,  Brasil 


xi 

Fuzzy  Systems  I 

171 

Industrial  Applications  of  Fuzzy  System  Modeling 

I.  Burhan  Turksen 

University  of  Toronto,  Canada 

173 

From  Intelligent  Models  to  Smart  Ones 

Heikki  Hyotyniemi 

Helsinki  University  of  Technology  ,Espoo,  Finland 

179 

A  Fuzzy  Design  Evaluation  Based  on  a  Taguchi  Quality  Approach 

A.  Donnarumma*,  N.  Cappetti*,  M.  Pappalardo*,  Esamuele  Santoro  ** 

*  Universitadi  Salerno,  Italy. 

**  Universitadi  Napoli,  Italy 

185 

Non-Traditional  Performance  Analysis 

J.  Arlen  Cooper 

Sandia  National  Laboratories,  Albuquerque,  New  Mexico,  USA 

191 

Methods  of  Creating  Membership  Functions  for  Fuzzy  Rules 
in  Knowledge  Bases 

Cezary  Orlowski 

Technical  University  ofGdansk,  Gdansk,  Poland 

195 

An  Efficient  Method  for  Constructing  Fuzzy  Rules 

Bojan  Novak 

University  ofMaribor,  Stajerska,  Slovenia 

201 

Fuzzy  Clustering  Model  Based  on  Changes  in  Vagueness 

Mika  Sato-Ilic 

University  ofTsukuba,  Ibaraki,  Japan 

207 

Thin  Films  and  Surface  Processing 

213 

Modeling  and  Control  of  Optical  Interference  Filters  Using  Plasma 
Assisted  Chemical  Vapor  Deposition 

Derek  A.  Linkens*,  M.F.  Abbod*,  J.  Metcalfe**,  B.  Nichols  ** 

*  University  of  Sheffield,  Sheffield,  U.K. 

**  GEC-Marconi  Limited,  Caswell,  UK. 

215 

A  Study  of  Mechanical  Properties  of  Multi-Layered  Thin  Films 

T.  Hirasawa,  H.  Kotera,  T.  Yamamoto,  Y.  Sakamoto,  S.  Shima 

Kyoto  University,  Sakyo-ku,  Kyoto,  Japan 

221 

Foundations  of  Micro-Machining 

Juergen  Leopold 

Institute  of  Tool  Engineering  and  Quality  Management, Chemnitz,  Germany 

227 

XU 


Design  of  Novel  Smoothing  by  Atomic  Layer  Epitaxy  for  233 

Microstructure  Fabrication 

S.  Hirose*,  A.  Yoshida**,  M.  Yamaura**,  H.  Munekata  ** 

*Mechanical  Engineering,  AIST,  MITI,Tsukuba,  Ibaraki,  Japan 
**Tokyo  Institute  of  Technology,  Midori-ku,  Yokohama,  Japan 

Study  of  the  Relationship  Between  Groove  Cross  Sectional  Area  239 

per  Pulse  of  Q-Switched  Yag  Laser  and  Strength  of  Processing  Sound 

T.  Kurita,  T.  Ono 

Tokyo  Metropolitan  Institute  of  Technology,  Tokyo,  Japan 

Optimization  of  Thickness  Distribution  of  Micro-Membrane  245 

by  Genetic  Algorithm 

Hidetoshi  Kotera,  Y.  Sakamoto,  T.  Hirasawa  and  S.  Shima 

Kyoto  University,  Sakyo-ku,  Kyoto,  Japan 

Manufacturing  of  Metallic  Prototypes  and  Tools  by  25 1 

Laser  Cutting  and  Diffusion  Bonding 
S.  Sandig,  P.  Wiesner 

Dep't.  of  Mechanical  Engineering,  Technical  University  ofllmenau,  Germany 

Evolutionary  Systems  and  Machine  Learning  255 

Artificial  Immune  Systems:  a  New  Frontier  in  Artificial  Intelligence  257 

Dipankar  Dasgupta*,  Stephanie  Forrest  ** 

*  University  of  Memphis,  Tennessee,  USA 

**  University  of  New  Mexico,  Albuquerque,  NM,  USA 

Inductive  Learning  for  Optimization  of  Simulation  Model  Output  269 

Rainer  Barton*,  Helena  Szczerbicka  ** 

*  German  Aerospace  Center  (DLR),  Institute  for  Flight  Mechanics, 

Braunschweig,  Germany 

**  University  of  Bremen,  Bremen,  Germany 

A  Genetically  Optimised  Fuzzy  Parser  of  Natural  Language  277 

Olgierd  Unold 

Wroclaw  University  of  Technology,  Wroclaw,  Poland 

A  Genetic  Algorithm-based  Approach  to  Solve  Process  Plan  Selection  281 

Problems 

K.M.  Tiwari*,  S.K.  Tiwari*,  Debjit  Roy*,  N.K.  Vidyarthi  **. 

*  Manufacturing  Engineering,  National  Institute  of  Foundry  and  Forge 
Technology,  Hatia,  Ranchi,  India 

**  Mechanical  Engineering,  NERIST,  Nirjuli,  Itanagar,  India 
***  SriVenkateshNagar,  Chennai-600092,  India 


Xlll 


Breeding  Policies  in  Evolutional  Approximation  of  Optimal  Subspace  285 

H.M.  Huang  and  P.L.  Leung 

City  University  of  Hong  Kong,  Kowloon,  Hong  Kong 

Prediction  of  Cement  Paste  Mechanical  Behaviour  from  Chemical  291 

Composition  using  Genetic  Algorithms  and  Artificial  Neural  Networks 
Jose  C.  Cassa,  Giovanni  Floridia,  Andre  R.  Souza,  Rodrigo  T.  Oliveira 

Universidade  Federal  da  Bahia,  Salvador,  Bahia,  Brazil 

Rough  Sets-based  Machine  Learning  Using  a  Binary  Discernability  Matrix  299 
Reynaldo  Felix,  Toshimitsu  Ushio 

Systems  and  Human  Science,  Osaka  University,  Toyonaka,  Japan 

Intelligence  in  the  Design  of  Materials  and  Processes  I  307 

INTELLIGOLD  -  An  Expert  System  For  Gold  Plant  Process  Design  309 

Vanessa  Torres*,  Arthur  Torres**,  John  A.  Meech  *** 

*Companhia  Vale  do  Rio  Doce,  Belo  Horizonte,  Brazil 

**  University  of  Sao  Paulo,  SP,  Brazil 

***  University  of  British  Columbia,  Vancouver,  Canada 

A  Hardware  Design  for  Real-Time  Multiple  Target  Tracking  317 

Frederick  Ferguson,  Chandra  Curtis 

North  Carolina  A&T  State  University,  Greensboro,  NC,  USA 

Low-Cost  Supersonic  Missile  Inlet  Fabrication  Technique  325 

C.S.  Cornelius,  D.A.  Gibson 

US  Army  Aviation  and  Missile  Command,  Redstone  Arsenal,  AL 

Design  of  High  Performance  Missile  Structures  Utilizing  331 

Advanced  Composite  Material  Technologies 
J.R.  Esslinger,  R.N.  Evans,  G.W.  Snyder 

US  Army  Aviation  and  Missile  Command,  Redstone  Arsenal,  AL 

Modelling  the  Mechanical  Stability  of  Metal  Catalyst  Carriers  339 

C.  Guist,  H.  Bode 

Bergische  Universitat-Gesamthochschule  Wuppertal,  Germany 

Integration  of  Newly  Developed  AI  Assembly,  Production,  347 

and  Material  Flow  Virtual  Tools 

Daniel  A.  Holder*,  Raymond  D.  Harrell*,  Terri  L.  Calton**, 

John  F.  Atkinson*,  Brandy  M.  Brasfield* 

*  US  Army  AMCOM,  Redstone  Arsenal,  Alabama 
**  Sandia  National  Laboratories,  Albuquerque,  NM 


XIV 


Prediction  of  Materials  Properties  353 

How  ab-initio  Computer  Simulation  Can  Predict  355 

Materials  Properties  Before  Experiment 
Yoshiyuki  Kawazoe 

Tohoku  University,  Sendai,  Japan 

Data  Driven  Knowledge  Extraction  of  Materials  Properties  361 

•ff  _  Jf  XX  XX 

J.S.  Kandola  ,  S.R.  Gunn  ,  I.  Sinclair  ,  P.A.S.  Reed 

University  of  Southampton,  U.K. 

A  Quantum  Neural  Net:  with  Applications  to  Materials  Science  367 

B.  Igelnik*,  M.  Tabib-Azar*,  Y.-H.  Pao*,  and  S.  R.  LeClair  ** 

*Case  Western  Reserve  University,  Cleveland,  OH,  USA 
**  Material  Directorate,  Wright  Laboratory,  Fairborn,  OH,  USA 

Ontology  for  Phase  Diagram  Databases  373 

N.  Ono,  R.  Kainuma,  H.  Ohtani,  K.  Ishida,  M.  Kato 

Tohoku  University,  Sendai,  Japan 

Prediction  of  Concrete  Mechanical  Behaviour  from  Data  381 

at  Lower  Ages  using  Artificial  Neural  Networks 

Jose  C.  Cassa,  Giovanni  Floridia,  Andre  R.  Souza,  Rodrigo  T.  Oliveira 

Universidade  Federal  da  Bahia,  Salvador,  Bahia,  Brazil 

Improving  the  Prediction  Accuracy  of  a  Constitutive  Model  389 

with  ANN  Models 

L.X.  Kong  and  P.D.  Hodgson 

Deakin  University,  Geelong,  Victoria,  Australia. 

Hybrid  Fuzzy  Modelling  Using  Simulated  Annealing:  395 

Application  to  Materials  Property  Prediction 
Min-You  Chen,  Derek  A.  Linkens 

The  University  of  Sheffield,  Sheffield,  UK 

Intelligence  in  Materials  Science  I  401 

Inorganic  Glasses:  Old  and  New  Structures  on  the  Eve  of  the  21st  Century  403 

J.  Sestak  *,  B.  Hlavacek  +,  N.  Koga  *** 

*  Czech  Academy  of  Sciences,  Prague,  Czech  Republic 
**  University  of  Pardubice,  Pardubice,  Czech  Republic 
Hiroshima  University,  Higashi-Hiroshima,  Japan 

Oxygen  Solubility  Modeling  in  Aqueous  Solutions  411 

Desmond  Tromans 

University  British  Columbia,  Vancouver,  B.C.,  Canada 


XV 


On  the  Oxidation  of  Steel  in  CO2  and  Air  417 

Gity  Samadi  Hosseinali,  Ainul  Akhtar 

Powertech  Labs  Inc.,  Surrey,  British  Columbia,  Canada 

Retardation  of  Hydrogen  Embrittlement  by  Electrolytic  ZrQ2  423 

Coating  of  AISI  430  Stainless  Steel 
I.B.  Huang,  S.K.  Yen 

National  Huwei  Institute  of  Technology,  Huwei,  Taiwan 

The  Effect  of  Ca  Addition  on  Viscosity  and  Electrochemical  429 

Properties  of  Mg- Alloys  Produced  by  Casting 

H.S.  Kim*,  Shuji  Hanada*,  Ha-Guk  Jeong*,  Dong-Wha  Kum  ** 

*  Tohoku  University,  Sendai,  Japan 

**  Korea  Institute  of  Science  and  Technology,  Seoul,  Korea 

Bio-Compatible  Ceramics  as  Mimetic  Material  for  Bone  Tissue  Substitution  43 1 
Zdenek  Strnad*,  Jaroslav  Sestak  ** 

*Lab.  for  Glass  and  Ceramics  (LASAK),  Prague,  Czech  Republic 
**Czech  Academy  of  Sciences,  Prague,  Czech  Republic 

Intelligent  Design  of  GaSb  doped  Single  Crystals^  437 

B.  Stepanek,  J.Sestak,  JJ. Mares,  J.Kristofik,  V.Sestakova,  P.Hubik 

Academy  of  Sciences  of  the  Czech  Republic,  Semiconductor  Department,  Prague, 
Czech  Republic, 


Intelligent  Image  Analysis  Applications  443 

Astronomical  Image  Processing  -  Applications  To  445 

Ultra-Faint  Imaging  of  Small,  Moving,  Solar  System  Bodies: 

Comets  and  Near-Earth  Objects 
Karen  J.  Meech 

University  of  Hawaii,  Institute  for  Astronomy,  Honolulu,  HI,  USA 

A  High  Performance  Computing  Algorithm  for  Improving  447 

In-Line  Holography 
Hesham  Eldeib 

Electronic  Research  Inst.,  National  Research  Center,  Giza,  Egypt 

Human  Face  Detection  System  by  KenzanNET  with  453 

Preprocess  Analyzing  of  Hyperspectral  Image 

Takakazu  Chashikawa*,  Keizo  Fujii**,  Yoshiyasu  Takefuji  * 

*  Keio  University,  Kanagawa,  Japan. 

**NITTAN  Co.,  Ltd.,  Japan 


XVI 


Using  Image  Analysis  and  Partial  Least  Squares  Method  to  459 

Estimate  Mineral  Concentrations  in  Mineral  Flotation 

Jari  Hatonen*,  Heikki  Hyotyniemi*,  J.  Miettunen**,  L.-E.  Carlsson  *** 

Helsinki  University  of  Technology,  Espoo,  Finland 
Outokumpu  Mining  Oy,  Pyhasalmi  Mine,  Pyhasalmi,  Finland 
Boliden  Mineral  AB,  Mineral  Processing,  Boliden,  Sweden 

A  Combined  Morphological  and  Color-Based  Approach  to  465 

Characterize  Flotation  Froth  Bubbles 

Giuseppe  Bonifazi,  Silvia  Serranti,  F.  Volpe,  R.  Zuco 

Universita  degli  Studi  di  Roma,  "La  Sapienza",  Italia 

Robust  Bubble  Delineation  Algorithm  for  Froth  Images  471 

Weixing  X.  Wang,  O.  Stephansson 

Department  of  Civil  and  Environmental  Engineering, 

Royal  Institute  of  Technology,  Stockholm,  Sweden 

The  Characterization  of  Flotation  by  Colour  Information  and  477 

Selecting  the  Proper  Equipment 
A.K,  Siren 

YTT  Information  Technology,  Espoo,  Otaniemi,  Finland 


Intelligence  in  Environmental  Applications  479 

Robust  Engineering  Approaches  to  Maximize  Results  in  Business,  481 

Cost,  Engineering,  Human,  Quality  and  System  Technologies 
Roberto  C.  Villas  Boas 

**  CYTED  -  Science  and  Technology  for  Development  in  Iberoamerica,  Mineral 
Technology  Sub-Program,  Madrid,  Spain 

Imaging  Techniques  for  Process  Optimization  and  Control  485 

in  Glass  Recycling 

Giuseppe  Bonifazi,  Paolo  Massacci 

Ingegneria  Chimica,  dei  Materiali,  Materie  Prime  e  Metallurgia, 

Universita  degli  Studi  di  Roma  "La  Sapienza",  Roma,  Italia 

Application  of  Heuristic  Modeling  in  Natural  Resource  Sciences  491 

Steven  Mackinson 

Fisheries  Centre,  University  of  British  Columbia, 

Vancouver,  B.C.,  Canada 

ARDEX  -  A  Fuzzy  Expert  System  for  ARD  Site  Remediation  499 

Judita  Balcita,  John  A.  Meech 

University  of  British  Columbia,  Vancouver,  B.C.,  Canada 


XVII 


Modeling  of  Gold  Heap  Leaching  for  Criteria  of  Sustainability  Targets  505 
Luiz  R.  P.  De  Andrade  Lima*,  Roberto  C.  Villas-Boas** 

*  Federal  University  ofBahia,  Salvador,  BA,  Brazil 

**  Center  for  Mineral  Technology,  Rio  de  Janeiro,  RJ,  Brazil 

Design  Optimisation  of  Aluminium  Recycling  Using  the  Taguchi  Approach  513 
A.R.  Khoei ,  D.T.  Gethin,  I.  Masters 

Mechanical  Engineering,  University  of  Wales  Swansea,  UK 

Towards  a  Better  Understanding  of  Environmental  Science  through  519 

Application  of  Fuzzy  Sets 

Mory  M.  Ghomshei,  John  A.  Meech 

University  of  British  Columbia,  Vancouver3-C.,  Canada 

Intelligence  in  Rolling  Processes  527 

Data  Mining  and  State  Monitoring  in  Hot  Rolling  529 

L.  Cser*  **,  A.S.  Korhonen**,  P.Mantyla***,  O.  Simula**,  J.Ahola  ** 

*  Bay  Zoltan  Institute  for  Logistics  and  Production  Technology, 

Miskolc-Tapolca,  Hungary 

**  Helsinki  University  of  Technology ,Espoo,  Finland 
***  Rautaruukki  Steel,  Raahe,  Finland 

Determination  of  Thickness  Control  Parameters  of  Rolling  537 

Processes  by  the  Sensitivity  Method,  using  Neural  Networks 
Luis  E.  Zarate*,  Horacio  Helman  ** 

*  Departamento  de  Cienciada  Computa?ao,  PontificiaUniversidade 
Catolica  de  Minas  Gerais,  Belo  Horizonte,  Brazil 

**  Departamento  de  Engenharia  Metal urgica  e  de  Materials, 

Universidade  Federal  deMinas  Gerais,  Belo  Horizonte,  Brazil 

AI  Approach  to  Modeling  Rolling  Loads  in  Design  of  543 

Cold  Rolling  Processes 

J.  Kusiak*,  J.G.  Lenard**,  K.  Dudek* 

*  Akademia  Gomiczo-Hutnicza,  Krakow,  Poland 

**  University  of  Waterloo,  Waterloo,  Ontario,  Canada 

Direct  Determination  of  Sequences  of  Passes  for  Strip  Rolling  549 

Process  by  Means  of  Fuzzy  Logic  Rules 
C.D.M.Pataro,  H.  Helman 

Universidade  Federal  deMinas  Gerais,  Belo  Horizonte,  Brazil 

Elongation-Controlled  Rolling  of  H-Shaped  Wire  555 

H.  Utsunomiya,  M.  Shinkawa,  F.  Shimaya,  Y.  Saito 

Materials  Science  and  Engineering,  Osaka  University,  Japan. 


XV111 


Application  of  a  Neural  Network  to  Speed  Up  a  Mathematical  561 

Model  for  Calculation  of  Strip  Profiles  in  Flat  Rolling 
Yukio  Shigaki,  Horacio  Helman 

Universidade  Federal  de  Minas  Gerais,  Belo  Horizonte,  Brazil 


Intelligent  Methods  in  Metal  Forming  Processes  563 

A  Fundamental  Study  of  the  Incremental  Deep  Drawing  Process  565 

S.  Shima,  H.  Kotera,  K.  Kamitani,  S.  Nagatomo 

Mechanical  Engineering,  Kyoto  University,  Kyoto,  Japan 

Intelligent  Design  Architecture  for  Process  Control  of  Deep-Drawing  571 

K.  Manabe*,  H.Koyama*,  K.Katoh**,  S.  Yoshihara  *** 

*  Mechanical  Engineering,  Tokyo  Metropolitan  University,  Japan 
**  Integrated  Systems  Japan, Ltd.,  Tokyo,  Japan 

***  Tokyo  National  College  of  Technology,  Tokyo,  Japan 

An  Iterative  Approach  to  Determine  Heat-Treatment  and  577 

Composition  from  the  Mechanical  Yield  Strength  of  an  Al-Li  Alloy 
James  M.  Fragomeni, 

Ohio  University,  Mechanical  Engineering,  Athens,  Ohio,  USA 

A  Design  of  Experiments  Statistical  Approach  to  Determine  585 

the  Effect  of  Extrusion  Process  Variables  on  the  Mechanical 
Properties  of  a  Heat-Treated  Al-Li  Alloy 
James  M.  Fragomeni 

Ohio  University,  Mechanical  Engineering,  Athens,  Ohio,  USA 

Control  of  Liquid  Segregation  of  Semi-Solid  Al- Alloys  during  593 

Intelligent  Compression  Testing 
C.G.  Kang,  K.D.  Jung,  H.K.  Jung 

Pusan  National  University,  Mechanical  Engineering,  Korea 

Adaptability  to  Frictional  Change  of  Fuzzy  Adaptive  Blank  601 

Holder  Control  for  Deep  Drawing 
S.  Yoshihara*,  K.  Manabe**,  H.  Nishimura  ** 

*  Tokyo  National  College  of  Technology,  Mechanical  Eng.,  Japan 
**  Tokyo  Metropolitan  University,  Tokyo,  Japan 

An  AI  Process  Control  System  with  Simulation  Database  and  607 

Adaptive  Filter  for  V-Bending 

M.  Yang*,  A.  Katayama*,  K.  Manabe*,  N.  Aikawa  ** 

*  Dep't.  of  Mechanical  Engineering,  Tokyo  Metropolitan  University,  Japan 
**  Tokyo  Engineering  University,  Japan 


xix 

Intelligent  Manufacturing  II 

613 

The  Distributed  Intelligent  Control  of  Complex  Systems 

Wayne  J.  Davis 

University  of  Illinois  at  Urbana-Champaign, 

Department  of  General  Engineering,  Urbana,  IL,  USA 

615 

PDM-based  Virtual  Enterprises  -  Bridging  the  Semantic  Gap 

A.  Karcher,  J.  Wirtz 

Dep't.  of  Mechanical  Engineering,  Technical  University  of  Munich, 

Garching,  Germany 

623 

A  Methodology  to  Diagnose  the  Target  Cost  in  a  Manufacturing  Process 

A.  Arioti,  C.  Fantozzi,  M.Granchi,  E.  Vettori 

Mechanical,  Nuclear  and  Manufacturing  Engineering, 

University  of  Pisa,  Italy 

629 

Resource  Allocation  for  a  Fast-Tracked  Project 

Yassiah  Bissiri,  Scott  Dunbar 

Department  of  Mining  and  Mineral  Process  Engineering, 

University  of  British  Columbia,  Vancouver,  B.C.,  Canada 

635 

Hybrid  Simulation  Objects  using  Fuzzy  Set  Theory  for 

Simulation  of  Innovative  Process  Chains 

T.  Menzel,  M.  Geiger 

Dep't.  of  Manufacturing  Technology, 

University  Erlangen-Nuremberg,  Erlangen,  Germany 

641 

Manufacturing  Management  Improvement  through 

Rapid  Production  of  Budgets 

E.J.  Colville 

School  of  Engineering,  University  of  Tasmania, 

Hobart,  Tasmania,  Australia 

649 

A  Connectionist  Method  to  Solve  Job  Shop  Problems 

Marko  Fabiunke,  Gerd  Kock 

GMD  Research  IT  Center, (FIRST)  Berlin,  Germany 

655 

Fuzzy  Systems  II 

661 

Designing  in  Many- Valued  Logic 

A.  Donnarumma  and  Michele  Pappalardo 

University  of  Salerno,  Mechanical  Engineering,  Fisciano,  Italy 

663 

XX 


Modulus  Genetic  Algorithm  and  its  Application  to  669 

Fuzzy  System  Optimization 
Sinn-Cheng  Lin 

Dep't.  of  Educational  Media  and  Library  Sciences, 

Tamkang  University,  Tamsui,  Taipei  Hsien  Taiwan,  PRC 

Fuzzy  Evolutionary  Programming  for  Portfolio  Selection  675 

in  Investment  Decisions 
T.  Van  Le 

Faculty  of  Information  Sciences  and  Engineering, 

University  of  Canberra,  Belconnen,  Australia. 

Design  of  a  Region-Wise  Fuzzy  Sliding  Mode  Controller  with  Fuzzy  Tuner  681 
C.C.  Kung,  W.C.  Lai 

Dep't.  of  Electrical  Engineering, 

Tatung  Institute  of  Technology,  Taipei,  Taiwan 

A  Multi-Input  Current-Mode  Fuzzy  Integrated  Circuit  687 

for  Pattern  Recognition 
Gu  Lin,  Bingxue  Shi 

Institute  of  Microelectronics,  Tsinghua  University,  Beijing,  PRC 

A  Framework  for  Intelligent  Systems  based  on  695 

Vector-Annotated  Logic  Programs 

Kazumi  Nakamatsu*,  Yumi  Hasegawa*,  Jair  Minoro  Abe**, 

Atsuyuki  Suzuki  ***. 

*Himeji  Institute  of  Technology,  School  of  Humanity, 

Environment  Policy  and  Technology,  Himeji,  Hyogo,  Japan 
**Paulista  University,  Sao  Paulo,  Brazil. 

***Shizuoka  University,  Japan 

A  Fuzzy  Logic  Assisted  Electrodynamic  Balance  for  Unit  703 

Operations  on  Single  Levitated  Particles 
M.  Pappalardo*,  A.  Pellegrino*,  M.  d'Amore**, 

P.  Giordano**,  P.  Russo  ** 

*  Dep't.  of  Mechanical  Engineering,  University  of  Salerno, 

Fisciano,  Italy 

**  Dep't.  of  Chemical  and  Food  Engineering.,  University  of  Salerno, 

Fisciano,  Italy 


Author's  Index 


1-1 


XXI 


IPMMf99 

The  Second  International  Conference  on 
Intelligent  Processing  and  Manufacturing  of  Materials 

Volume  2 

Contents 


Artificial  Neural  Networks  I  711 

Artificial  Neural  Networks  (ANN)  as  Simulators  and  Emulators:  713 

An  Analytical  Overview 
Guy  M.  Nicoletti 

University  of  Pittsburgh  at  Greensburg,  Pennsylvania,  USA 

Logical  Rule  Extraction  from  Data  by  Maximum  Neural  Networks  723 

T.  Saito,  Y.  Takefuji 

Keio  University,  Fujisawa,  Kanagawa,  Japan 

Iterative  RBF  Neural  Networks  as  Metamodels  of  Stochastic  Simulations  729 
George  Meghabghab,  George  Nasr 

Dep't.  of  Mathematics  and  Computer  Science, 

Valdosta  State  University,  Valdosta,  Georgia,  USA 

A  Systematic  and  Reliable  Approach  to  Pattern  Classification  735 

RJDoraiswami,  M.Stevenson,  S.  Rajan 

Department  of  Electrical  Engineering,  University  of  New  Brunswick, 

Fredericton,  New  Brunswick,  Canada 

Dynamic  Associative  Memory  Using  Chaotic  Neural  Networks  743 

Yoshihisa  Fukuhara,  Yoshiyasu  Takefuji 

Keio  University,  Graduate  School  of  Media  and  Governance,  Fujisawa, 

Kanagawa,  Japan 

Trends  in  Intelligent  Process  Control  in  the  Primary  Aluminium  Industry  749 
R.T.  Bui*,  L.  Tikasz*,  J.  Perron  ** 

*  Universite  du  Quebec  a  Chicoutimi,  Chicoutimi,  Quebec,  Canada 
**  Alcan  International  Ltd,  Jonquiere,  Quebec,  Canada 

Modeling  of  the  Flow  Stress  Relationship  using  a  BP  Network  755 

Y.  Y.  Yang,  Derek  A.  Linkens 

University  of  Sheffield,  Sheffield,  U.K. 


XXI 1 


Intelligence  in  Materials  Science  II  763 

The  Heredity  and  Control  of  Microstructures  of  Liquid  Metals  765 

During  Rapid  Cooling  Processes 
Rang-su  Liu,  Ji-yong  Li,  Hai-rong  Liu 

*Department  of  Physics,  Hunan  University,  Changsha,  P.R.  China 
**  Department  of  Chemistry,  Hunan  University,  P.R.  China 

AI  Approach  to  Internal  Variable-based  Rheological  Model  for  Steels  773 

J.  Kusiak,  M.  Pietrzyk 

Department  of  Metallurgy  and  Materials  Engineering, 

Akademia  Gomiczo-Hutnicza,  Krakow,  Poland 

Electrolytic  Zr02  Coating  on  Co-Cr-Mo  Implant  Alloys  of  Hip  Prosthesis  779 
S.K.  Yen,  H.Z.  Zan,  M.J.  Guo 

National  Chung  Hsing  University,  Taichung,  Taiwan 

Automated  Stress  Control  of  Electroplated  Nickel-Phosphorus  Alloy  785 

G.T.  Yu,  M.  Wiliams 

National  Huwei  Institute  of  Technology,  Materials  Eng.,  Taiwan 

Mechanism  of  Electrolytic  Coating  of  A1203  on  MAR-M247  Superalloy  789 

S.K.  Yen,  C.C.  Chang 

Dep't.  of  Material  Engineering,  National  Chung  Hsing  University, 

Taichung,  Taiwan 

A  New  Process  to  Produce  Advanced  Zirconia-based  Ceramic  797 

Composites  from  Low-Value  Minerals 
Sonia  M.  B.Veiga*,  Marcello  M.  Veiga**, 

A.C.D.  Chaklader**,  J.  C.  Bressiani  *. 

*  Inst,  de  Pesquisas  Energeticas  e  Nucleares ,  Sao  Paulo,  Brasil. 

**  University  of  British  Columbia,  Vancouver,  BC,  Canada 

High  Temperature  Flow  Stress  Model  and  Hot  Deformation  805 

Behaviors  for  High-Mo  Austenitic  Stainless  Steel 
Xu  Yourong,  Chen  Liangshen,  Jin  Lei,  Wang  Deying 

Materials  Science  and  Engineering,  Shanghai  University, 

Jiading,  P.R.  China. 

Intelligent  Manufacturing  III  811 

Information  Management  of  Complex  Systems :  813 

Perspectives  for  the  New  Millennium 
Z.  Gomolka*,  E.  Szczerbicki  ** 

*  University  of  Szczecin,  Szczecin,  Poland 

**  University  of  Newcastle,  Newcastle,  Australia 


xxiii 

Present  Status  of  Intelligent  Machines  in  Sheet  Metal  Fabricating  and 
Forming  in  Japan 

Junichi  Endou 

Kanagawa  Institute  of  Technology,  Kanagawa,  Japan 

817 

Design  of  Enterprise  Network  Communication  Subsystems 

Adam  Grzech 

Wroclaw  University  of  Technology,  Poland 

823 

The  Industrial  Desktop  -  Real  Time  Business  and  Process 

Analysis  to  Increase  Productivity  in  Industrial  Plants 

Osvaldo  A.  Bascur 

OSI  Software,  Inc.,  The  Woodlands,  Texas,  USA 

829 

Enterprise  Staff  Scheduling  by  Genetic  Algorithm  Search 

Tiehua  Zhang*,  William  A.  Gruver*,  Michael  H.  Smith  ** 

*  Simon  Fraser  University,  Burnaby,  BC,  Canada 
**University  of  California,  Berkeley,  CA,  USA  and 

839 

Intelligence  in  Surface  Processing  of  Materials 

845 

Intelligent  AE  Sensor  for  Monitoring  of  Finish  Machining  Process 

Slavko  Dolinsek*,  J.  Kopac*,  Z.J.  Viharos*,  L.  Monostori** 

*  University  of  Ljubljana,  Mechanical  Engineering,  Slovenia 
**  Hungarian  Academy  of  Science,  Budapest,  Hungary 

847 

A  New  Fuzzy-Fractal  Approach  for  Surface  Quality  Control  in 

Intelligent  Manufacturing  Of  Materials 

P.  Melin,  O.  Castillo 

Tijuana  Institute  of  Technology,  Chula  Vista,  California,  USA 

855 

A  Study  on  Axisymmetric  Indentation  by  the  Rigid-Plastic 
Finite-Boundary  Element  Method 

Yong-Ming  Guo,  Kenji  Nakanishi 

Dep't.  of  Mechanical  Engineering,  Kagoshima  University,  Kagoshima,  Japan 

861 

Design  of  Intelligent  Spindle  for  High  Speed  Machining 

B.L.  Zhang,  Y.P.  Li,  B.S.  Zhu,  P.  Ma,  Y.  Luo 

Guangdong  University  of  Technology,  Guangzhou,  China 

867 

Robotics  and  Intelligent  Control  I 

869 

Autonomous  Control  of  Complex  Dynamical  Systems  in 

Support  of  a  Manned  Mission  to  Mars 

James  A.  Kurien,  Daniel  J.  Clancy 

NASA  Ames  MS  269-3,  Moffett  Field,  California,  USA 

871 

XXIV 


Mining  Automation  in  the  Next  Millennium:  a  Tele-Operated 
LHD  Vehicle  Model 

Yeen-Shien  Hwang*1,  Neda  Farmer*2,  Jason  Hart  ** 

*University  of  British  Columbia,  Vancouver,  B.C.,  Canada 

1  Huckleberry  Mines  Ltd.,  Houston,  B.C.,  Canada 

2  Luscar  Coal  Mine,  Hinton,  A.B.,  Canada 

**  Nautilus  International  Limited,  Burnaby,  B.C.,  Canada 

Dynamic  Reconfiguration  of  Holonic  Lower  Level  Control 
X.  Zhang*,  D.H.  Norrie*,  A.  Kusiak  ** 

*  University  of  Calgary,  Canada 
**  University  of  Iowa,  USA 

Intelligent  Process  Monitoring  for  Paper  Machines 
Janos  L.  Grantner*,  Peter  E.  Parker**,  George  A.  Fodor  *** 

*  Department  of  Electrical  and  Computer  Engineering, 

West  Michigan  University,  Kalamazoo,  Michigan,  USA. 

**  Department  of  Paper  and  Printing  Science  and  Engineering, 

West  Michigan  University,  Kalamazoo,  Michigan,  USA. 

***  ABB  Automation  Products  AB,Vasteras,  Sweden 

An  Integration  Design  Approach  in  PID  Controller 
Jen- Yang  Chen 

China  Institute  of  Technology  and  Commerce,  Taipei,  Taiwan 

Holonically  Object  Oriented  System 
Shigeki  Sugiyama 

Gifu  Industry  and  Technology  Research  Center,  Kasamatsu-Cho, 
Hashima-Gun,  Gifu-Ken,  Japan. 

Intelligent  Instrumentation  and  Measurement 

Pre-Processing  of  Industrial  Process  Data  for  Outlier  Detection 
and  Correction 

Jonathan  Tenner*,  Derek  A.  Linkens*,  T.J.  Bailey  ** 

*  University  of  Sheffield,  UK. 

**British  Steel  Engineering  Steels  U.K.  Ltd. 

Intelligent  Measurement  System  Confirmation 
P.  H.  Osanna,  M.N.  Durakbasa 

Vienna  University  of  Technology,  Wien,  Austria 

Simulation  of  the  Dynamic  Properties  of  Nuclear  Meters  in  Coal 
Preparation  Control  Systems 
Stanislaw  Cierpisz 

Silesian  Technical  University,  Poland 


XXV 


Acoustic  Emission  Monitoring  of  SAG  Mill  Performance  939 

S.J.  Spencer,  J.J.  Campbell,  K.R.  Weller,  Y.  Liu 

CSIRO  Minerals,  Queensland,  Australia 

Novel  Polymeric  Electrochemical/Chemical  Sensors  and  Display  Devices  947 
Integrated  with  Artificial  Intelligence 
A.  Talaie*’**,  J.Y.Lee***,  Y.K.  Lee****,  J.  Jang*, 

D.J.  Choo****,  S.H.  Park****,  G.  Huh****,  J.A.  Romagnoli** 

*  Physics  Department,  Kyung  Hee  University,  Seoul,  Korea 

**  Chemical  Engineering,  Sydney  University,  Sydney,  Australia 
***  Chemistry  Dep't.,  NSW  University,  Sydney,  Australia 
****  Kyung  Hee  University,  Seoul,  Korea 

Material  Properties  under  Drawing  and  Extrusion  with  953 

Cyclic  Torsion 

L.X.  Kong,  P.D.  Hodgson,  L.  Lin  and  B.  Wang 

School  of  Engineering  and  Technology,  Deakin  University, 

Geelong,  Victoria,  Australia 

Artificial  Neural  Networks  II  959 

Neural  Network-based  Resistance  Spot  Welding  Quality  Prediction  961 

N.  Ivezic,  J.D.  Allen,  Jr.,  T.  Zacharia 

Oak  Ridge  National  Laboratory,  Oak  Ridge,  TN,  USA 

An  Adaptive  Artificial  Neural  Network  to  Model  a  Cu/Pb/Zn  Flotation  967 
Circuit 

Saiedeh  Forouzi,  John  A.  Meech 

University  of  British  Columbia,  Vancouver,  B.C.,  Canada 

Multivariable  Predictive  Neuronal  Control  Applied  to  975 

Grinding  Plants 

Manuel  Duarte*,  Alejandro  Suarez**,  Danilo  Bassi  *** 

*  Dep't.  Ing.  Electrica,  Universidad  de  Chile,  Santiago,  Chile 

**  Dep't.  de  Electronica,  Univ.  T.F.  Sta.  Maria,  Valparaiso,  Chile 
***  Dep't.  de  Informatica,  Univ.  de  Santiago,  Santiago,  Chile 

Practical  Neural  Network  Applications  in  the  Mining  Industry  983 

Logan  Miller-Tait,  Rimas  Pakalnis 

University  of  British  Columbia,  Vancouver,  B.C.,  Canada 

Neural  Network  Model  and  Model-Based  Control  of  Deformation 
Processing 

Nenad  Ivezic,  John  D.  Allen,  Jr.,  Thomas  Zacharia 

Oak  Ridge  National  Laboratory,  Oak  Ridge,  TN,  USA 


989 


XXVI 


Verifying  Detected  Facial  Parts  by  Multidirectional  Associative  Memory  995 
Miki  Kitabata,  Yoshiyasu  Takefuji 

Keio  University,  Fujisawa  Kanagawa,  Japan 

A  Current-Mode  Sorting  Circuit  for  Pattern  Recognition  1003 

Gu  Lin  ,  Bingxue  Shi 

Microelectronics,  Tsinghua  University,  Beijing,  P.R.  China 

Intelligence  in  the  Design  of  Materials  and  Processes  II  1009 

Intelligent  Design  Methods  for  Smart  Materials  1011 

Madjid  Fathi-Torbaghan,  L.  Hildebrand 

Dep't.  of  Computer  Science,  University  of  Dortmund,  Dortmund,  Germany 

Identification  of  a  Model  Which  Relates  Variations  in  Shape  1017 


Geometry  to  Process  Control  Variables  of  Shape  Forging 

B. F.  Rolfe*,  M.J.  Cardew-Hall*,  G.A.W.  West**,  S.M.  Adballah* 

*  Australian  National  University,  Canberra,  ACT,  Australia 
**Curtin  University  of  Technology,  Perth,  Australia 

Mechanical  Characteristics  of  HIPed  SiC  Particulate-Reinforced  1023 

Al-Alloy  MMCs 

C. Y.  Chung,  K.C.  Lau 

City  University  of  Hong  Kong,  Kowloon,  Hong  Kong,  P.R.  China 

Hydrostatic  Extrusion  of  Composite  Rod  1029 

Ui-Bin  Tsai,  Chi-Wei  Wu,  Ray-Quen  Hsu 

National  Chiao-Tung  University,  Hsin-Chu,  Taiwan 

Numerical  Modelling  and  Localized  Failure  Analysis  in  Metal  Powder  1 035 
Forming  Processes 

A.R.  Khoei,  R.W.  Lewis,  D.T.  Gethin 

Mechanical  Engineering,  University  of  Wales  Swansea,  UK 

Microstructure  and  High  Temperature  Deformation  Behavior  of  a  1041 

TiN/TisSyVano-Grain  Composite  Produced  by 
Non-Equilibrium  PM  Processing 
Kei  Ameyama  and  Yasuhiko  Suehiro 

Ritsumeikan  University,  Kusatsu  City,  Shiga,  Japan 

Shape  Prediction  of  Growing  Billet  in  Spray  Casting  1047 

using  a  Scanning  Gas  Atomizer 

Eon-Sik  Lee*,  Sangho  Ahn*  and  Shinill  Kang  ** 

*  Research  Institute  of  Industrial  Science  and  Technology, 

Advanced  Materials  Division,  Pohang,  Kyungbuk,  South  Korea 

**Yonsei  University,  Seoul,  Korea 


xxvii 

Intelligence  in  Concurrent  Engineering 

1053 

Modelling  Design  Planning  in  Concurrent  Engineering 

C.  Reidsema  and  E.  Szczerbicki 

Dep't.  of  Mechanical  Engineering, 

University  of  Newcastle,  NSW,  Australia 

1055 

Computer-Aided  Integrated  Design  for  Injection  Molding 

Yuh-Min  Chen*,  Rong-Shean  Lee*,  ChengTer  Ted  Ho  *** 

*  National  Cheng  Kung  University,  Tainan,  Taiwan 

***  National  KaoHsiung  Inst,  of  Science  and  Technology,  Taiwan 

1061 

Artificial  Psychology  -  an  Attainable  Scientific  Research  on  the 
Human  Brain 

Zhiliang  Wang,  Lun  Xie 

University  of  Science  &  Technology  (USTB),  Beijing, P.R.  China 

1067 

Soft-Object  Technology  for  Autonomous  Manufacturing 

Components  Control 

Ahmed  Hambaba 

College  of  Engineering,  San  Jose  State  University,  San  Jose,  CA 

1073 

A  Monitoring  Framework  for  Software  Project  Development 
Ho-Leung  Tsoi*  and  Derek  Cheung  ** 

*  Software  Quality  Institute,  Griffith  University,  Australia 
**  Computer  Studies,  City  University  of  Hong  Kong,  Hong  Kong 

1079 

Redefining  the  Web:  toward  the  Creation  of  Large-Scale 

Distributed  Applications 

Guy  M.  Nicoletti 

Engineering  Department,  University  of  Pittsburgh  at  Greensburg, 
Pennsylvania,  USA 

1087 

How  Can  We  Form/Expand  Conceptions  in  Workers'  Minds 
According  to  Their  Individualities? 

Kumiko  Ishino 

Konan  University,  Utsunomiya-City,  Kobe,  Japan 

1093 

Robotics  and  Intelligent  Control  II 

1101 

Navigation  by  Weighted  Chance 

S.  Reimann*,  A.  Mansour  ** 

*German  National  Research  Center  for  Information  Technology, 
Birlinghoven,  Germany 

**  Bio-Mimetic  Control  Research  Center,  (RIKEN),  Nagoya,  Japan 

1103 

XXV111 


Vehicle  Routing  Problem  Using  Clustering  Algorithm  by 
Maximum  Neural  Networks 
Noriko  Yoshiike,  Yoshiyasu  Takefuji 

Keio  University,  Fujisawa,  Kanagawa,  Japan 

Acquisition  of  Communication  Protocol  for  Autonomous 

Multi-AGVs  Driving 

Michiko  Watanabe,  Masashi  Furukawa 

Asahikawa  National  College  of  Technology,  Hokkaido,  Japan 

Heuristic  Neuro-Fuzzy  Model  For  Evaluation  of  Urban 

Transportation  Projects 

Marcus  Vinicius  Quintella  Cury,  Saul  Fuks 

Universidade  Federal  do  Rio  de  Janeiro,  UFRJ,  Brasil 

Optimal  Controller  Design  for  Finite  Word  Length  Implementation 
using  a  Genetic  Learning  Algorithm 
Wen-Shyong  Yu 

Tatung  Institute  of  Technology,  Taipei,  Taiwan 

Adaptive  Fuzzy  Controller  for  Non-Linear  Uncertain  Systems 
Chiang-Cheng  Chiang,  Chih-Chien  Hu 

Tatung  Institute  of  Technology,  Taipei,  Taiwan,  R.O.C. 

Hybrid  Modeling  (view-graphs) 

Holistic  Strategies  for  Designing  Multistage  Material  Processes 
W.G.  Frazier 

Air  Force  Research  Laboratories,  Wright-Patterson  AFB,  Ohio 

A  New  Methodology  of  Using  Design  of  Experiments  as  a 
Precursor  to  Neural  Networks  for  Material  Processing: 

Extrusion  Die  Design 

Bhavin  Mehta.  Hamza  Ghulman,  Rick  Gerth 

Ohio  University,  Athens,  Ohio 

Incorporating  Hybrid  Models  into  a  Framework  for  Design  of 
Multi-Stage  Material  Processes 
E.  Medina 

Air  Force  Research  Laboratories,  Ohio 

Hybrid  Modeling  for  the  Interdisciplinary  Design  of  More 
Affordable  Systems 

J.  Poindexter.  Gerald  R.  Shumaker,  Brian  A.  Stucke 

Air  Force  Research  Laboratories,  Ohio 


XXIX 


Hybrid  Modeling  for  Testing  Intelligent  Software  for  Lunar-Mars  1 1 79 

Closed  Life  Support 
Jane  T.  Malin 

Intelligent  Systems  Branch,  Automation,  Robotics  and  Simulation  Division 
NASA  Johnson  Space  Center,  Houston,  Texas,  USA 

Discrete  Modeling  via  Function  Approximation  Methods  -  1185 

Towards  Bridging  Atomic-  and  Micro-Scales 
A.G.  Jackson,  M.  Benedict 

Air  Force  Research  Laboratories,  Dayton,  Ohio,  USA 

Microstructure  Predictions  From  Atomistic  Rule  Set  Cellular  Automata  1 1 97 
M.O.  Zacate,  R.W.  Grimes,  P.D.  Lee 

Imperial  College,  London  University,  London,  England,  U.K. 

Fuzzy  Molecular  Modeling  1225 

David  A.  Ress 

North  Carolina  State  University,  North  Carolina,  USA 

Imaging  Studies  and  Density  Functional  Analysis  of  Surfaces  and  1235 

Interfaces:  Comparison  of  Theory  and  Experiment 
John  F.  Maguire.  Steven  R.  LeClair 

Air  Force  Research  Lab,  Wright-Patterson  AFB,  Dayton,  Ohio 

Modeling  Gas  Byproducts  from  MOCVD  Thin-Film  Depositions  1241 

J.  G.  Jones,  P.D.  Jero 

Air  Force  Research  Lab,  Wright-Patterson  AFB,  Dayton,  Ohio 

Imaging  for  Process  Optimization  and  Control  (view-graphs)  1247 

Nondestructive  Imaging  of  Surface  &  Sub-Surface  Defects  in  Thin-Films  1249 
with  Super  Spatial  Resolution  using  Evanescent  Microwave  Fields 
Massood  Tabib-Azar 

Case  Western  Reserve  University,  Cleveland,  Ohio 

Investigation  of  Raman  Imaging  for  Advanced  Control  of  1258 

YBCO  Cool-  Down  Processing  using  Pulsed  Laser  Deposition 

J.D.  Busbee  *\  R.R.  Biggers*,  J.G  Jones*,  D.V.  Dempsey*2,  G.  Kozlowski  ** 

*  AFRL,  Materials,  Wright-Patterson  AFB,  Dayton,  Ohio 
**  AFRL,  PRP,  Wright-Patterson  AFB,  Ohio 

1  Technical  Management  Concepts,  Inc.,  Beavercreek,  Ohio 

2  University  of  Dayton  Research  Institute,  Dayton,  Ohio 

Process  Control  Via  Gaze  Detection  Technology  1263 

Jaihie  Kim*,  Gang  Ryung  Park*,  Steven  R.  LeClair  ** 

*  Yonsei  University,  Seoul,  Korea 

**  AFRL,  Wright-Patterson  AFB,  Dayton,  Ohio 


XXX 


The  Third  Eye  Cameras  -  Dynamic  and  Static  Hyperspectrum  Imaging  1271 
Yoshiyasu  Takefuji 

Environmental  Information,  Keio  University,  Fujisawa,  Japan 

The  Third  Eye  Approach  to  Innovative  Designs  and  Applications  into  the  1277 
21st  Century  -  Human  Recognition  System  by  Nonlinear  Oscillations 
Souichi  Oka.  Yoshiyasu  Takefuji ,  William  Huang 

Environmental  Information,  Keio  University,  Fujisawa,  Japan 

Intelligent  Rate  Control  for  MPEG-4  Coders  1285 

Gwanh  Hoon  Park.  Jae  Hyung  Park,  Yoon  Jin  Lee 

Yonsei  University,  Computer  Science,  Wonju,  Kwangwon,  Korea 

Concept,  Development,  Mass  Production,  and  Applications  1297 

of  Artificial  Retina  Chips 
Kazuo  Kyuma 

Mitsubishi  Electric  Corporation,  Japan 

Data  Reduction  via  Auto-Associative  Neural  Networks  1305 

Claudia  Kropas-Hughes, 

Air  Force  Research  Lab,  Wright-Patterson  AF  Base,  Dayton,  OH 

Image  Processing  Plume  Fluence  for  Superconducting  Thin-Film  1317 

Depositions 

J.G.  Jones*,  R.R.  Diggers*,  J.D.  Busbee*1,  D.V.  Dempsey*2,  G.  Kozlowski  ** 

*  Air  Force  Research  Lab,  Materials  Direct.,  WPAFB,  Dayton,  OH 
**  Air  Force  Research  Laboratory,  PRP,  WPAFB,  Dayton,  OH 

1  Technical  Management  Concepts,  Inc.  Beavercreek,  OH 

2  University  of  Dayton  Research  Institute  Dayton,  OH 

Innovations  in  Materials  Design  1321 

Towards  the  Future:  Innovations  in  Materials  Design  1323 

Suichi  Iwata 

RACE,  University  of  Tokyo,  Faculty  of  Eng.,Hongo,  Japan 

Atomic  Environments  in  Relation  to  Compound  Prediction  1339 

Jo  Daams*,  Pierre  Villars  ** 

*  Phillips  Research,  The  Netherlands 

**  Materials  Phases  Data  System  (MPDS),  Vitznau,  Switzerland 

Analysis  and  Visualization  of  Category  Membership  Distribution  in  1361 

Multivariate  Data 

Yoh-Han  Pao*.  B.F.  Duan*,  Y.L.  Zhao*,  Steven  R.  LeClair  * 

*  Case  Western  Reserve  University,  Cleveland,  Ohio 

**  Air  Force  Research  Lab,  Wright-Patterson  AFB,  Dayton,  OH 


XXXI 


Whitney  Reduction  Networks  for  Process  Discovery  1371 

Mark  Oxley 

Mathematics  and  Statistics,  Air  Force  Inst.  Tech.,  WPAFB,  Ohio 

Algorithms  for  Predicting  Properties  of  Materials  from  Intelligent  1381 

Materials  Design  by  Hyperspace  Data  Mining 
Nianyi  Chen*,  Dongping  Daniel  Zhu  ** 

*  Shanghai  Metallurgy  Inst,  of  Chinese  Acad,  of  Sci.,  P.R.  China 
**  Zaptron  Systems,  Inc.,  Mountain  View,  CA,  USA 

Data  Bases  and  Semantic  Networks  for  Inorganic  Materials  1387 

Computer  Design 
N.N.  Kiselyova 

A.A.Baikov  Institute  of  Metallurgy  and  Materials  Science,  Russian  Academy  of 
Science,  Moscow,  Russia 

First-Principles  Calculations  for  Materials  Science:  1397 

Their  Power  and  Limitations 
Wanda  Andreoni 

IBM  Research  Division,  Zurich  Research  Laboratory,  Switzerland 

Interplay  Between  Large  Materials  Databases,  Semi-Empirical  1399 

Approaches,  Neuro-Computing  and  First  Principle  Calculations 
Pierre  Villars  *,  Steven  R.  LeClair  **,  Suichi  Iwata  *** 

*  Material  Phases  Data  System  (MPDS),  Vitznau,  Switzerland 
**  Air  Force  Research  Laboratory,  Wright-Patterson  AFB,  Ohio 
***  RACE,  Faculty  of  Eng.,  University  of  Tokyo,  Hongo,  Japan 

Software  Package  "MATERIALS  DESIGNER"  and  its  Application  1417 

in  Materials  Research 

Nianyi  Chen*,  Wencong  Lu**,  Ruiliang  Chen*,  Pei  Qin 

*  Shanghai  Metallurgy  Inst,  of  Chinese  Acad,  of  Sci.,  P.R.  China 
**  Department  of  Chemistry,  Shanghai  University,  P.R.  China 


II- 1 


Author's  Index 


1 


Plenary  Presentations 


3 


From  Computing  with  Numbers  to  Computing  with  Words:  from 
Manipulation  of  Measurements  to  Manipulation  of  Perceptions 

Lotfi  A.  Zadeh 

*  Professor  in  the  Graduate  School  and  Director, 

Berkeley  Initiative  in  Soft  Computing  (BISC) 

Computer  Science  Division  and  the  Electronics  Research  Laboratory, 
Department  of  Electrical  Engineering  and  Computer  Science, 

University  of  California,  Berkeley,  CA  94720-1776  USA 
Tel:  510-642-4959  Fax:510-642-1712  Email:  zadeh@cs.berkeley.edu 

Computing,  in  its  usual  sense,  is  centered  on  manipulation  of  numbers  and  symbols.  In  contrast,  computing 
with  words,  or  CW  for  short,  is  a  methodology  in  which  the  objects  of  computation  are  words  and 
propositions  drawn  from  a  natural  language,  e.g.,  small,  large,  far,  heavy,  not  very  likely,  the  price  of  gas  is 
low  and  declining,  Berkeley  is  near  San  Francisco,  it  is  very  unlikely  that  there  will  be  a  significant 
increase  in  the  price  of  oil  in  the  near  future,  etc. 

Computing  with  words  is  inspired  by  the  remarkable  human  capability  to  perform  a  wide  variety  of 
physical  and  mental  tasks  without  any  measurement  and  any  computation.  Familiar  examples  of  such  tasks 
are  parking  a  car,  driving  in  heavy  traffic,  playing  golf,  riding  a  bicycle,  understanding  speech  and 
summarizing  a  story.  Underlying  this  remarkable  capability  is  the  brain's  crucial  ability  to  manipulate 
perceptions  -  perceptions  of  distance,  size,  weight,  color,  speed,  time,  direction,  force,  number,  truth, 
likelihood  and  other  characteristics  of  physical  and  mental  objects.  Manipulation  of  perceptions  plays  a  key 
role  in  human  recognition,  decision  and  execution  processes.  As  a  methodology,  computing  with  words 
provides  a  foundation  for  a  computational  theory  of  perceptions  —  a  theory  which  may  have  an  important 
bearing  on  how  humans  make  —  and  machines  might  make  —  perception-based  rational  decisions  in  an 
environment  of  imprecision,  uncertainty  and  partial  truth. 

A  basic  difference  between  perceptions  and  measurements  is  that,  in  general,  measurements  are  crisp 
whereas  perceptions  are  fuzzy.  One  of  the  fundamental  aims  of  science  has  been  and  continues  to  be  that  of 
progressing  from  perceptions  to  measurements.  Pursuit  of  this  aim  has  led  to  brilliant  successes.  We  have 
sent  men  to  the  moon;  we  can  build  computers  that  are  capable  of  performing  billions  of  computations  per 
second;  we  have  constructed  telescopes  that  can  explore  the  far  reaches  of  the  universe;  and  we  can  date  the 
age  of  rocks  that  are  millions  of  years  old.  But  alongside  the  brilliant  successes  stand  conspicuous 
underachievements  and  outright  failures.  We  cannot  build  robots  which  can  move  with  the  agility  of 
animals  or  humans;  we  cannot  automate  driving  in  heavy  traffic;  we  cannot  translate  from  one  language  to 
another  at  the  level  of  a  human  interpreter;  we  cannot  create  programs  which  can  summarize  non-trivial 
stories;  our  ability  to  model  the  behavior  of  economic  systems  leaves  much  to  be  desired;  and  we  cannot 
build  machines  that  can  compete  with  children  in  the  performance  of  a  wide  variety  of  physical  and 
cognitive  tasks. 

It  may  be  argued  that  underlying  the  underachievements  and  failures  is  the  unavailability  of  a  methodology 
for  reasoning  and  computing  with  perceptions  rather  than  measurements.  An  outline  of  such  a  methodology 
—  referred  to  as  a  computational  theory  of  perceptions  is  presented  in  this  paper.  The  computational  theory 
of  perceptions,  or  CTP  for  short,  is  based  on  the  methodology  of  computing  with  words  (CW).  In  CTP, 
words  play  the  role  of  labels  of  perceptions  and,  more  generally,  perceptions  are  expressed  as  propositions 
in  a  natural  language.  CW-based  techniques  are  employed  to  translate  propositions  expressed  in  a  natural 
language  into  what  is  called  the  Generalized  Constraint  Language  (GCL).  In  this  language,  the  meaning  of 
a  proposition  is  expressed  as  a  generalized  constraint,  X  L  R,  where  X  is  the  constrained  variable,  R  is  the 
constraining  relation  and  iSr  is  a  variable  copula  in  which  r  is  a  variable  whose  value  defines  the  way  in 
which  R  constrains  X.  Among  the  basic  types  of  constraints  are:  possibilistic,  veristic,  probabilistic, 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


4 


random  set,  Pawlak  set,  fuzzy  graph  and  usuality.  The  wide  variety  of  constraints  in  GCL  makes  GCL  a 
much  more  expressive  language  than  the  language  of  predicate  logic. 

In  CW,  the  initial  and  terminal  data  sets,  IDS  and  TDS,  are  assumed  to  consist  of  propositions  expressed  in 
a  natural  language.  These  propositions  are  translated,  respectively,  into  antecedent  and  consequent 
constraints.  Consequent  constraints  are  derived  from  antecedent  constraints  through  the  use  of  rules  of 
constraint  propagation.  The  principal  constraint  propagation  rule  is  the  generalized  extension  principle.  The 
derived  constraints  are  re-translated  into  a  natural  language,  yielding  the  terminal  data  set  (TDS).  The  rules 
of  constraint  propagation  in  C  W  coincide  with  the  rules  of  inference  in  fuzzy  logic.  A  basic  problem  in  CW 
is  that  of  explicitation  of  X,  R  and  r  in  a  generalized  constraint,  X  isr  R,  which  represents  the  meaning  of  a 
proposition,  p,  in  a  natural  language. 

There  are  two  major  imperatives  for  computing  with  words.  First,  computing  with  words  is  a  necessity 
when  the  available  information  is  too  imprecise  to  justify  the  use  of  numbers;  and  second,  when  there  is  a 
tolerance  for  imprecision  which  can  be  exploited  to  achieve  tractability,  robustness,  low  solution  cost  and 
better  rapport  with  reality.  Exploitation  of  the  tolerance  for  imprecision  is  an  issue  of  central  importance  in 
CW  and  CTP.  At  this  juncture,  the  computational  theory  of  perceptions  —  which  is  based  on  CW  —  is  in  its 
initial  stages  of  development.  In  time,  it  may  come  to  play  an  important  role  in  the  conception,  design  and 
utilization  of  information/intelligent  systems.  The  role  model  for  CW  and  CTP  is  the  human  mind. 


Research  supported  in  part  by  NASA  Grant  NAC2-1 177,  ONR  Grant  N00014-96-1-0556, 
ARO  Grant  DAAH  04-961-0341  and  the  BISC  Program  of  UC  Berkeley. 


5 


Hybrid  Modeling  for  Testing  Intelligent  Software 
for  Lunar-Mars  Closed  Life  Support 

Jane  T.  Malin 

Intelligent  Systems  Branch,  Mail  Code  ER2 
Automation,  Robotics  and  Simulation  Division 
NASA  Johnson  Space  Center 
Houston,  Texas,  USA  77058-3696 
Email:  malin@isc.nasa.gov 


ABSTRACT 

Intelligent  software  is  being  developed  for  closed  life  support  systems  with  biological  components,  for 
human  exploration  of  the  moon  and  Mars.  The  intelligent  software  functions  include  planning/scheduling, 
reactive  discrete  control  and  sequencing,  management  of  continuous  control,  and  fault  detection,  diagnosis, 
and  management  of  failures  and  errors.  To  develop  and  test  the  software  and  to  provide  operational  model- 
based  what-if  analyses,  four  types  of  modeling  information  have  been  essential  to  system  modeling  and 
simulation:  1)  discrete  component  operational  and  failure  modes,  2)  continuous  dynamic  performance 
within  component  modes,  modeled  qualitatively  or  quantitatively,  3)  configuration  of  flows  and  power 
among  components  in  the  system,  and  4)  operations  activities  and  scenarios.  The  CONFIG  modeling  and 
simulation  tool  has  been  used  to  model  components  and  systems  involved  in  production  and  transfer  of 
oxygen  and  carbon  dioxide  gas  in  a  plant  growth  chamber  and  between  that  chamber  and  a  habitation 
chamber  with  physico-chemical  systems  for  gas  processing.  CONFIG  is  a  multi-purpose  discrete  event 
simulation  tool  that  integrates  all  four  types  of  models,  for  use  throughout  the  engineering  and  operations 
life  cycle.  Within  CONFIG,  continuous  algebraic  quantitative  models  are  used  within  an  abstract  discrete 
event  qualitative  modeling  framework  of  component  modes  and  activity  phases.  Component  modes  and 
activity  phases  are  embedded  in  transition  digraphs.  Flows  and  flow  reconfigurations  are  efficiently 
analyzed  during  simulations.  Modeled  systems  for  the  Life  Support  Testbed  have  included  biological  plants 
(using  algebraic  quantitative  models)  and  crew  members,  oxygen  concentrators  and  an  oxygen  transfer 
control  subsystem,  and  injectors  and  flow  controllers  in  a  carbon  dioxide  control  subsystem.  To  test  the 
discrete  control  software,  some  elements  of  the  lower  level  control  layer  and  higher  level  planning  layer  of 
the  intelligent  software  architecture  are  modeled,  using  CONFIG  activity  models.  CONFIG  simulations 
show  effects  of  events  on  a  system,  including  control  action  or  failures,  local  and  remote  effects,  and 
behavioral  and  functional  effects,  the  time  course  of  effects,  and  how  effects  may  be  detected.  The 
CONFIG  models  are  interfaced  to  the  discrete  control  software  layer  and  used  to  perform  dynamic 
interactive  testing  of  the  software  in  nominal  and  off-nominal  scenarios. 


INTRODUCTION 

Consumables  production  plants  and  closed  life  support  systems  with  biological  components  are  being 
developed  for  human  exploration  of  the  moon  and  Mars.  Hierarchical  intelligent  control  software  is  being 
designed  for  autonomous  operations  of  these  systems  of  processors  that  convert  resources  to  products.  The 
intelligent  control  systems  are  made  up  of  layers  of  functions  including  planning/scheduling,  reactive 
discrete  control  and  sequencing,  management  of  continuous  control,  and  management  of  failures  and  errors. 
These  systems  accomplish  four  broad  types  of  systems  management:  1 )  planning  and  scheduling  for  storage 
and  transport  of  resources  and  products,  2)  discrete  or  continuous  control  of  processes,  or  performance  of 
processors,  3)  execution  of  procedures  and  sequences  for  discrete  control  of  operational  configurations  and 
modes  of  processors  and  other  components,  and  4)  management  of  instrumentation  and  control  subsystems. 
Whether  a  system  goal  can  be  achieved  or  a  hazard  can  be  avoided,  within  some  time  and  within  some 
resources,  depends  on  capability  both  to  establish  an  operating  mode  (often  in  the  context  of  a  supporting 
flow  or  power  configuration),  and  to  process  or  perform  at  a  predicted  rate  within  that  target  mode. 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


6 


INTELLIGENT  SOFTWARE  FOR  CONTROL  OF  LIFE  SUPPORT  SYSTEMS 

In  NASA’s  Lunar  Mars  Life  Support  Test  Program  (LMLSTP)  Phase  III  90-day  manned  test,  a  three-tiered 
(3T)  hierarchical  autonomous  control  architecture  was  tested  [1],  The  3T  software  provided  integrated 
monitoring  and  control  (IMC)  of  product  gas  transfer  between  a  plant  growth  chamber,  a  crew  chamber  and 
an  incinerator  [11],  The  basic  configuration  of  chambers  for  the  test  is  shown  in  Figure  1.  Four  crew 
members  lived  in  the  20  Foot  Chamber  for  90  days.  A  physico-chemical  air  revitalization  system  (ARS)  in 
the  20  Foot  Chamber  included  a  four-bed  molecular  sieve  (4BMS)  to  remove  C02  from  the  chamber,  and  a 
C02  Removal  System  (CRS)  and  02  Generation  System  (OGS)  that  worked  together  to  add  02  to  the 
chamber.  The  variable  pressure  growth  chamber  (VPGC)  housed  plant  trays  for  growing  staged  crops  of 
wheat.  The  airlock  was  connected  to  the  VPGC  and  the  incinerator.  The  incinerator  periodically  converted 
human  waste  and  paper  products  from  the  20  Foot  Chamber  into  C02  and  FLO.  The  IMC  software  managed 
the  configuration  of  the  C02  supply  for  transfer  to  the  VPGC  for  plant  photosynthesis  and  02  production, 
and  managed  the  systems  for  concentrating,  storing  and  transferring  the  02  produced  by  the  plants.  The 
IMC  software  managed  the  configuration  of  the  02  supply  for  transfer  to  the  airlock  for  incineration  or  to 
the  crew  chamber.  The  software  flexibly  reconfigured  and  transferred  gas  among  multiple  reservoirs  in 
response  to  predicted  needs,  observed  usage  and  problem  with  elements  of  the  system. 


Incinerator 


20  Foot  Chamber 


Fig.  I.  Product  Gas  Transfer  in  the  Phase  III  Test 

The  uppermost  tier  of  3T  is  a  planner  that  handles  management  of  resources  and  products,  and  the  middle 
tier  is  a  sequencer.  The  sequencer  provides  a  reactive  discrete  control  layer  that  handles  event-based 
control,  sequencing  and  procedures  for  managing  operational  configurations  and  phases  of  operation  [2]. 
The  planner  can  alter  the  sequencer’s  task  agenda.  The  lowest  tier,  the  skill  manager,  handles  low  level 
control.  The  skill  manager  interfaces  with  both  the  sequencer  and  the  hardware,  and  manages  continuous 
performance  of  processors  and  the  continuous  control  systems  themselves.  The  discrete  and  continuous 
control  layers  and  a  user  interface  layer  can  manage  instrumentation  and  control  subsystems. 


HETEROGENEOUS  MODELS 

The  layered  approach  provides  several  advantages,  including  modularity,  separation  of  concerns,  and 
support  for  multiple  levels  of  intervention.  However,  these  layers  relate  to  distinct  engineering  approaches 
associated  with  the  four  types  of  system  management  goals,  and  a  conceptual  framework  is  needed  to 


7 


integrate  the  diverse  approaches.  The  system-management  framework  helps  bridge  the  gap  between 
conventional  continuous  models  and  analyses  and  discrete  symbolic  models  for  autonomous  systems  control 
[5].  In  the  LMLSTP  case,  the  resources  and  products  are  C02  and  02.  To  support  planning  and  scheduling, 
systems  of  simplified  processing-rate  models  are  typically  used  for  analysis  of  resources  and  balances  in 
scenarios.  The  LMLSTP  processors  are  wheat  plants,  crew,  CRS/OGS,  and  the  incinerator.  To  support 
control  and  sizing  analysis,  differential  and  algebraic  models  are  used,  based  on  analytic  or  empirical  data. 
The  LMLSTP  configurations  include  flow  paths,  valves,  pumps,  fans,  injectors,  chambers,  tanks  and 
processors.  To  support  managing  configurations  for  operations,  models  are  typically  systems  of  connected 
state  transition  models.  The  instrumentation  and  control  subsystem  includes  gas  concentration  control, 
processor  control,  flow  and  pressure  control  and  valve  control.  To  support  managing  instrumentation  and 
control  subsystems,  state  transition  models  are  used  for  modes  of  control  or  control  regimes. 

Four  heterogeneous  types  of  models  have  been  essential  to  develop,  test  and  maintain  intelligent  control 
software  for  space  life  support  systems  and  to  provide  operational  model-based  what-if  analyses:  1)  discrete 
functional  operational  and  failure  modes  of  components,  2)  continuous  dynamic  performance  within 
component  modes,  modeled  qualitatively  or  quantitatively,  3)  configuration  of  flows  and  power  among 
components  in  the  system,  and  4)  operations  activities,  schedules  and  scenarios.  The  CONFIG  simulation 
tool  has  provided  the  necessary  integration  of  all  four  types  of  models  [6,  8,  9,  10],  making  it  a  suitable 
testbed,  for  dynamic  interactive  simulation-based  testing  of  the  LMLSTP  IMC  application  of  the  3T  layered 
control  software  [7]. 


CONFIG  HYBRID  DISCRETE  EVENT  SIMULATION 

CONFIG  was  developed  to  support  analysis  of  designs  for  systems  and  their  operations.  CONFIG  extends 
discrete  event  simulation  with  capabilities  for  continuous  system  modeling.  The  purpose  of  these 
enhancements  has  been  to  make  it  possible  to  apply  discrete  event  technology  for  model-based  prediction, 
to  support  design  and  evaluation  of  intelligent  software  for  control  and  fault  management.  Although  discrete 
event  simulation  has  typically  been  used  for  stochastic  analyses  of  scenarios,  CONFIG  simulations  are 
deterministic,  for  specific  states  and  inputs. 

CONFIG  uses  a  state  transition  system  formalism  in  a  system  model  made  up  of  a  set  of  connected 
components,  or  “devices”  structured  within  a  configuration  or  "flow  path".  The  direction  of  physical  flows 
and  the  effects  of  flow  reconfigurations  are  efficiently  analyzed  during  simulations.  Two  of  the  basic 
building  blocks  of  a  CONFIG  model  are  devices  and  activities.  Devices  model  the  behavior  of  system 
hardware  components  and  activities  model  actions  in  procedures  or  software.  Examples  of  system  devices 
are  pumps,  valves,  tanks  and  condensers.  Device  relations  represent  the  connections  between  system 
components.  Activity-device  relations  are  used  to  relate  activities  to  system  components  for  control  and 
monitoring  purposes. 

The  modular  discrete  event  modeling  approach  provides  a  framework  for  organizing  and  managing  the 
application  of  more  detailed  knowledge.  In  device  models,  time-related  behavior  models  are  embedded 
within  modes,  and  these  modes  are  within  state-transition  systems.  Two  modes  of  a  simple  valve,  for 
example,  might  be  open  and  closed.  The  way  a  device  interacts  with  connected  devices  can  depend  on  the 
current  mode.  Failures  can  be  modeled  as  modes  or  as  factors  that  precipitate  or  prevent  transitions. 
Transitions  between  device  modes  can  be  determined  by  control  variables,  variable  changes  propagated 
through  inter-device  connections,  or  by  changes  in  system  flows.  Model  structure  can  be  “recomposed” 
during  a  simulation,  as  the  direction  and  activation  of  interconnections  changes. 

Activity  models  are  also  state-transition  models.  Several  levels  of  control  can  be  modeled  as  activities.  An 
activity  might  be  used  to  control  the  positioning  of  a  set  of  valves,  for  example.  States  of  activity  models, 
called  activity  phases,  have  embedded  control  behaviors.  These  behaviors  can  represent  discrete  or 
continuous  control  regimes,  or  elements  of  schedules  or  simulation  scenarios. 


8 


Life  support  system  applications  require  accurate  accounting  of  resource  inventories  transferred  by 
continuous  flow  at  variable  rates  to  various  locations  within  the  modeled  system.  In  CONFIG,  two 
operators,  Integrate  and  Apply- When,  are  used  to  periodically  compute  states  or  time  advances  that  depend 
on  continuous  changes.  The  Apply- When  operator  calls  external  algebraic  functions  to  determine  the  time 
advance  for  a  rate-dependent  event.  The  Integrate  operator  uses  a  discrete-time  approach,  providing 
periodic  updates  of  variables  based  on  a  rate,  which  may  be  changed  dynamically  by  external  inputs. 
Complex  behavior  emerges  from  the  interaction  of  devices  that  have  simple  models  of  internal  continuous 
processes. 

CONFIG  provides  an  object-oriented  and  graphical  environment  for  building  models  and  managing 
simulation  tests.  This  environment  supports  incremental  model  development,  maintenance  and  reuse. 


CONFIG  Application  for  Validating  Autonomous  Control  Software 

CONFIG  simulations  were  used  to  validate  IMC  software  that  provided  control  during  the  LMLSTP  Phase 
III  90-day  manned  test.  IMC  sequencer  software  monitored  and  controlled  the  model  rather  than  the  skills 
layer  and  hardware.  The  model  included  diverse  components  and  systems  for  processing  O2  and  C02  gases 
in  a  plant  growth  chamber,  crew  chamber  and  incinerator,  and  for  storing  gases  and  transferring  them 
between  chambers  [3].  Figure  2  shows  the  Product  Gas  Transfer  (PGT)  system  model  during  a  simulation. 
The  arrowheads  along  relations  indicate  the  directions  of  active  gas  flows.  The  default  graphic 
representations  are  rectangles  for  devices  and  elongated  ovals  for  activities.  Modes  are  indicated  by  the  text 
in  the  rectangles  and  ovals,  or  by  appearances  of  icons  that  indicate  device  modes. 


The  modeled  devices  include  the  chambers,  various  gas  processors  that  convert  02  to  C02  or  vice  versa,  gas 
concentrators,  and  PGT  hardware  that  directs  and  regulates  flow  and  pressure.  The  modeled  activities 
include  discrete  and  continuous  control  of  the  hardware  that  directs  and  regulates  flow  and  pressure, 
schedules  for  crops  and  human  activities,  and  some  manual  procedures.  The  activity  models  represent 
control  by  the  3T  planning  or  skills  tiers,  local  controllers  or  human  operators.  There  are  only  two 
continuous  feedback  controllers  in  the  3T  skills  tier.  The  rest  of  the  control  is  discrete,  based  on  deadbands 
and  schedules. 


9 


Simulation-based  testing  followed  unit  testing  and  hardware  integration  testing  by  the  software  developers. 
The  interactive  simulation-based  testing  used  multiple  “batch”  long-duration  scenarios,  running  at  about  20 
times  real  time.  The  testing  verified  software  activities  during  nominal  operations  in  a  system  context,  and 
tested  software  response  to  hardware  problems  and  imbalances.  The  test  results  are  documented  in  [4]  and 
[7].  The  testing  uncovered  some  software  bugs  and  some  issues  concerning  software  requirements.  The 
most  interesting  issue  was  observed  in  the  context  of  a  complex  interaction  including  elements  of  the  crew 
chamber  and  the  plant  growth  chamber.  It  is  unlikely  that  this  type  of  software  problem  would  have  been 
found  during  conventional  software  testing  since  it  involved  a  sequence  of  interactions  of  multiple  devices 
and  controllers  in  the  system  that  would  be  difficult  to  conceive  of  or  emulate  in  conventional  software 
testing. 

During  simulation  tests,  when  the  C02  accumulator  was  depleted  the  IMC  software  switched  the  source  of 
C02  from  the  accumulator  to  the  facility  supply  as  intended,  except  when  the  plant  chamber  C02 
concentration  was  between  the  alert-low  and  alarm-low  thresholds.  When  the  plant  chamber  C02 
concentration  was  below  the  alert-low  level  (1000  ppm)  and  the  C02  accumulator  on  the  crew  chamber  side 
was  also  at  its  alert-low  limit  (12  psi),  the  IMC  software  failed  to  switch  to  the  facility  C02  supply.  The 
IMC  software  disabled  continuous  flow  into  the  plant  chamber  and  handed  over  control  to  the  local  CQ> 
controller  in  the  plant  chamber.  The  local  controller  then  switched  to  the  backup  pulse  injection  system  to 
raise  the  C02  level  in  the  plant  chamber.  Because  the  IMC  software  had  failed  to  switch  the  C02  source 
from  the  accumulator  to  the  facility  supply,  the  backup  system  drew  C02  from  the  depleted  accumulator. 
The  C02  level  in  the  plant  chamber  continued  to  drop  even  with  the  backup  system  on. 


CONCLUSION 

The  CONFIG  models  successfully  represented  the  heterogeneous  set  of  model  types  that  were  required  for 
testing  intelligent  reactive  control  software.  The  full  range  of  model  representations  was  necessary  for  the 
test,  especially  for  control  and  reconfiguration.  For  future  programs,  simulation-based  validation  testing  will 
include  more  complete  coverage  of  off-nominal  scenarios.  The  strength  of  this  type  of  simulation-based 
validation  is  in  production  of  cascaded  effects  of  events  in  a  complex  system,  to  produce  novel  operational 
scenarios  that  intelligent  software  must  handle.  In  a  system  as  complex  as  the  Phase  III  testbed,  even  a 
seemingly  innocuous  deviation  from  normal  operating  conditions  may  have  more  serious  consequences  than 
expected.  This  same  type  of  simulation  should  also  be  used  to  support  analysis  of  operational  scenarios  to 
support  requirements  development.  Reliability  and  safety  engineers  remind  us  that  software  problems  are 
more  often  associated  with  lack  of  system  understanding  or  requirements  errors  than  with  coding  errors. 
Dynamic  interactive  simulation  of  this  type  can  help  get  the  requirements  right  for  management  of  complex 
systems. 

Current  work  includes  CONFIG  extensions  to  support  interactive  operator- in-the-loop  evaluations  of 
strategies  for  adjustable  autonomy,  which  supports  operator  intervention  at  multiple  levels  when 
appropriate.  An  interface  has  been  developed  between  CONFIG  and  the  lowest  skills  layer  of  the  autonomy 
software,  to  support  testing  of  all  layers  of  the  architecture.  More  models  are  being  developed,  to  support 
engineering  and  operation  of  autonomous  production  plants  for  consumables  on  Mars. 

REFERENCES 

1.  P.  Bonasso  ,  R.J.  Firby,  E.  Gat,  D.  Kortenkamp,  D.  Miller  and  M.  Slack,  1997.  Experiences  with  an 
architecture  for  intelligent,  reactive  agents.  J.  Experimental  and  Theoretical  AI,  9,  237-256. 

2.  R.  J.  Firby,  1997.  The  RAP  Language  Manual.  Neodesic  Corporation. 

3.  L.  Fleming,  T.  Hatfield  and  J.  Malin,  1998.  Simulation-Based  Test  of  Gas  Transfer  Control  Software: 
CONFIG  Model  of  Product  Gas  Transfer  System.  Automation,  Robotics  and  Simulation  Division 
Report,  AR&SD-98-017,  NASA  Johnson  Space  Center. 

4.  L.  Fleming,  T.  Hatfield  and  J.  Malin,  1998.  Simulation-Based  Test  of  Gas  Transfer  Control  Software: 
Software  Validation  Test  Results.  Automation,  Robotics  and  Simulation  Division  Report,  AR&SD-98- 
018,  NASA  Johnson  Space  Center. 

5.  J.  T.  Malin,  1998.  Some  Roles  of  Models  in  Monitoring  and  Control  for  BlO-Plex.  SAE  Paper  No. 
981727.  SAE  28th  International  Conference  on  Environmental  Systems,  Danvers,  Mass. 


10 


6.  J.  T.  Malin,  B.  D.  Basham  and  R.  A.  Harris,  1990.  Use  of  qualitative  models  in  discrete  event 
simulation  for  analysis  of  malfunctions  in  continuous  processing  systems.  In  Mavrovouniotis,  M.  ed., 
Artificial  Intelligence  in  Process  Engineering.  San  Diego,  Calif.:  Academic  Press,  37-79. 

7.  J.  T.  Malin,  L.  Fleming  and  T.  Hatfield,  1998.  Interactive  simulation-based  testing  of  product  gas 
transfer  integrated  monitoring  and  control  software  for  the  Lunar  Mars  Life  Support  Phase  III  Test. 
SAE  Paper  No.  981769.  SAE  28th  International  Conference  on  Environmental  Systems,  Danvers  Mass. 

8.  J.  T.  Malin  and  D.  B.  Leifker,  1991.  Functional  modeling  with  goal-oriented  activities  for  analysis  of 
effects  of  failures  on  functions  and  operations.  Informatics  &  Telematics  8(4),  353-364. 

9.  J.  T.  Malin,  D.  Ryan  and  L.  Fleming,  1993.  CONFIG  -  Integrated  engineering  of  systems  and  their 
operation.  In  Proc.  Fourth  National  Technology  Transfer  Conference,  NASA  Conference  Publication 
CP-3249,  97-104. 

10.  J.  T.  Malin,  D.  Ryan  and  L.  Fleming,  1994.  Computer-aided  operations  engineering  with  integrated 
models  of  systems  and  operations.  In  Proc.  Dual  Use  Space  Technology  Transfer  Conference  and 
Exhibition,  NASA  Conference  Publication  CP-3263,  455-461. 

1 1.  D.  Schreckenghost,  P.  Bonasso,  D.  Kortenkamp  and  D.  Ryan,  1998.  Three  tier  architecture  for 
controlling  space  life  support  systems.  In  Proc.  IEEE  Symposium  on  Intelligence  in  Automation  and 
Robotics. 


11 


Image  Analysis  and  Vision  Systems 
for  Processing  Plants 

Antti  J.  Niemi*,  Heikki  Hyotyniemi*,  and  Raimo  YlinerT 

*  Helsinki  University  of  Technology 
Control  Engineering  Laboratory 
P.O.  Box  5400,  FIN-02015  HUT,  Finland 

**  University  of  Oulu 
Systems  Engineering  Laboratory 
P.O.  Box  4300,  FIN-90401  Oulu,  Finland 


ABSTRACT 

Material  flowing  in  a  processing  plant  may  be  visually  observable,  but  its  characterization  by  physical  and 
computational  means  of  analysis  can  prove  difficult.  Problems  are  encountered  in  practice  both  at  optical 
imaging  and  at  extraction  of  features  that  characterize  the  material  or  its  stage  of  processing,  and  at  related 
control  of  the  process.  Intelligent  analysis  of  the  vast  amount  of  data  provided  by  process  vision  systems  is 
discussed  in  the  paper,  in  the  light  of  two  continuous  flow  processes,  i.e.  the  froth  flotation  of  minerals  and 
the  wet  end  control  of  the  paper  machine. 


INTRODUCTION 

Visible  features  of  moving  and  flowing  materials  are  in  the  process  industries  traditionally  monitored  by 
human  operator.  His  visual  observations  are  subjective  and  inaccurate,  and  cannot  be  directly  converted  to 
standard  physical  transmission  signals,  but  suitable  devices  for  instrumentation  of  imaging  and  visual 
observation  had  to  be  waited  up  to  the  advent  of  modem  television  technology.  The  classical  TV  camera 
was  then  used  for  closed  circuit  monitoring  of  processes,  until  the  semiconductor  camera  proved  in  1970k 
more  able  to  operate  under  industrial  conditions,  in  transmission  of  sequences  of  image  signals  that  could 
then  be  memorized,  processed  by  computer,  reproduced  at  remote  locations  and  used  to  control  of 
industrial  processes  in  real  time.  In  fact,  it  turned  out  that  a  one-dimensional,  repeatedly  scanned  array  of 
sensor  elements  was  sufficient  to  formation  of  a  continuous  2-D  image  of  materials  and  objects  which  were 
steadily  carried  past  a  sensor  array  camera,  e.g.  on  a  conveyor  belt. 

However,  despite  of  the  development  of  electronics,  the  interfaces  of  the  vision  systems  with  process 
environments  have  often  proved  problematic  retarding  the  development  of  their  applications.  An  industrial 
material  or  object  cannot  be  especially  prepared  for  the  needs  of  on-line  image  analysis,  and  the  problems 
of  its  presentation  for  viewing  have  to  be  solved  together  with  those  of  the  analysis.  Problems  are  created 
e.g.  by  unevenness  of  material  surface,  variable  transparency  and  reflectance  of  the  often  large  object 
surface  which  may  include  specularly  reflecting  elements  or  parts,  scales  and  dust  on  surfaces,  vapors  or 
particles  in  the  atmosphere  etc.  These  require  case  specific  arrangements  of  homogeneous  or  structured 
illumination  of  a  large  surface  and  direction  of  viewing  in  relation  to  it.  Successful  imaging  requires  that 
they  favour  the  distinction  of  such  features  of  the  object  which  relate  to  the  interesting  characteristics  of  its 
composition  or  structure,  but  it  may  be  geometrically  constrained  by  the  dimensions  of  the  process 
machinery  and  shadows  cast  by  it.  Deficiencies  in  presentation  or  imaging  can  be  partly  compensated  for 
by  intelligent  processing  of  the  data  recorded,  but  it  is  generally  advisable  to  remove  or  reduce  them  at  their 
source,  as  far  as  possible. 

The  imaging  of  industrial  processes  is  usually  not  aimed  only  to  analysis  of  the  object  and  display  of  its 
output  to  the  operator,  but  to  control,  preferably  automatic  control,  of  the  imaged  or  a  downstream  process. 
Problems  of  other  kind  are  met  here.  They  are  connected  with  controllable  inputs  of  the  process  and  with 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


12 


their  effects  on  the  process  variables  including  those  which  are  measured  or  evaluated  by  the  image 
analyzer.  Intelligent  control  algorithms  and  choices  of  control  inputs  are  needed  also  here,  in  order  to  reach 
an  optimal  control  of  the  process. 

If  the  quantities  that  are  to  be  extracted  from  the  images  are  known  a  priori,  and  if  the  process  structure  is 
known,  a  closed-loop  control  can,  in  principle,  be  implemented  based  on  the  information  that  is  obtained 
through  image  analysis.  However,  if  the  process  structure  is  not  known,  visual  inspection  can  be  applied  to 
acquire  new  intuition:  features  can  be  extracted  based  on  the  correlations  observed  in  the  data.  Hopefully, 
these  features  capture  the  characteristic  behavioral  patterns  of  the  process,  revealing  some  hidden  structure 
underlying  the  data.  Construction  of  the  features  is  a  mathematically  involved  task,  and  a  specialized 
framework  is  needed. 

Problems  of  imaging,  image  analysis  and  related  process  control  are  discussed  in  the  present  paper  in  the 
light  of  two  example  cases  that  illustrate  the  level  reached  in  their  solution.  They  relate  to  analysis  of 
flotation  in  mineral  industry  and  to  analysis  of  the  dry  line  of  a  paper  machine. 


THEORETICAL  FRAMEWORK 

The  goal  of  data  modelling  is  compression  of  data,  so  that  a  large  number  of  process  measurements  could 
be  expressed  in  terms  of  only  a  few  parameters.  Sometimes,  however,  the  primary  goal  is  reconstruction  of 
the  original  source  signals  when  only  their  mixtures  can  be  observed  Both  of  these  goals  can  be  expressed 
as  clever  recombination  of  the  measurements. 

In  what  follows,  assume  that  the  measurement  data  consist  of  clusters  characterized  by  linear  subspaces  (or 
linear  varieties)  in  the  data  space.  The  observations  are  results  of  various  (at  least  locally)  linearly  additive 
features  visible  simultaneously  in  the  different  information  sources.  Furthermore,  assume  that  the  observed 
high-dimensional  data  vector  y  can  be  expressed  as  a  linear  combination,  a  weighted  sum  of  N  distinct 
features  0,  so  that 

T  =  X<t>,0,-  1- 

i=i 

The  features  span  a  relatively  low-dimensional  feature  space  where  all  of  the  observed  data  patterns  can 
approximately  be  presented  using  the  domain-oriented  feature  coordinates. 

There  are  different  approaches  to  defining  the  features  characterizing  the'process;  most  of  them  are  rather 
mathematically  oriented.  They  are  often  based  on  different  kinds  of  local  artificial  features,  e.g.,  splines  or 
wavelets,  or  other  orthogonal  function  families.  The  problem  with  these  methods  is  that  the  new 
parameterization  in  terms  of  the  artificial  features  does  not  offer  new  intuition  in  what  kind  of  physical 
phenomena  are  present  in  the  observations.  To  reach  ‘fcmart”  representations  of  the  process  state,  the 
features  have  to  be  domain-oriented. 

A  traditional  approach  to  find  data-oriented  features  is  to  use,  for  example,  the  principal  components  or  the 
eigenvectors  of  the  data  correlation  matrix  as  features  ( Principal  Component  Analysis,  PCA ;  see  [1]).  The 
principal  components  corresponding  to  largest  eigenvalues  are  the  most  important  ones,  capturing  the 
directions  of  maximum  variance  in  the  data,  and  the  other  principal  components  can  perhaps  be  neglected. 

The  problem  with  PCA  is  that  it  tries  to  explain  all  of  the  N  possible  features  simultaneously  using  only  m  < 
N  memory  units,  resulting  in  a  model  consisting  of  some  kind  of  ‘Average”  features.  If  an  observation 
includes  some  subset  of  m  features  out  of  all  available  N  feature  candidates,  it  is  evident  that  being  able  to 
exactly  detect  these  m  domain-oriented  features  the  error  variance  will  be  zero.  This  kind  of  sparse  feature 
model  does  not  minimize  the  number  of  memory  units  to  achieve  the  same  level  of  accuracy  as  the 
mathematically  optimal  principal  component  model.  Instead,  it  minimizes  the  number  of  active  units  which 
means  that  a  minimum  number  of  data  transfer  is  needed. 

There  are  various  methods  for  sparse  coding,  most  notably  perhaps  the  anti-Hebbian  learning  (for  example, 
see  [2],  [3],  and  [4]).  In  what  follows,  an  approach  that  is  specially  tailored  for  practical  applications  is 
presented  (see  [5]  and  [6]). 


13 


Learning  algorithm 

The  sparsity  of  the  data  representation  [1]  means  that  the  overall  or  global  model  is  nonlinear  but  locally 
linear.  In  order  to  get  the  features  linearly  better  separable,  the  datapreprocessing  phase  is  important.  First, 
all  information  sources  must  be  made  structureless.  This  also  means  that,  in  addition  to  process 
measurements,  the  a  priori  information  also  has  to  be  coded  as  a  set  of  real-valued  (sometimes  binary) 
variables.  The  following  algorithm  is  based  on  assumption  that  the  features  can  be  found  from  the 
dependencies  or  correlations  between  the  data  units. 

The  ‘feature  map”,  an  extension  of  a  self-  organizing  maps,  consists  of  nodes  i,  1  <  /  <  N ,  with  prototype 
(feature  estimate)  vectors  0,-.  The  adaptation  of  the  estimates  is  carried  out  as  follows  [7]: 

1.  Take  the  next  data  vector  sampley. 

2.  Select  the  prototype  vector  with  index  c  having  the  best  (positive  or  negative)  correlation  with  the 
vector  v: 


c  =  argmax  1£,<A, 


=  argmax  ,<,<# 


2. 


3.  For  each  of  the  nodes  apply  the  Kohonen  type  learning  (or  adaptation)  algorithm  [8]  using  the 
vector  (j^y  as  input: 

6/  6/  +yh(i, c)®cy -6; )  3. 


where  the  parameter  h(i,c)  defines  the  neighborhood  relation  between  the  nodes  i  and  c,  and  y 
defines  the  adaptation  rate  . 

4.  Normalize  the  prototype  vectors: 


e,  <" 


4. 


5.  Eliminate  the  contribution  of  the  prototype  number  c  by  setting 

y  y-§c®c 


6.  If  the  iteration  limit  m  has  not  been  reached,  go  back  to  Step  2,  otherwise  go  back  to  Step  1 . 


After  the  features  have  converged,  the  algorithm  can  be  used  for  clustering,  i.e.,  for  finding  the  actual 
operation  regime,  together  with  the  fine  structure  or  the  feature  weights  within  the  clusters. 

The  algorithm  is  closely  related  to  the  Generalized  Hebbian  Algorithm  (GHA)  that  can  be  used  for 
calculation  of  the  principal  components  of  E{yy T}  [9].  Whereas  GHA  always  results  in  orthogonal  set  of 
features,  the  presented  algorithm  can  lead  to  nonorthogonal  ones.  The  algorithm  could  be  called 
Generalized  Generalized  Hebbian  Algorithm  (GGHA).  The  operation  of  the  algorithm  can  be  described  as  a 
self-organizing  algorithm  of  Kohonen  type,  which  combines  the  principal  component  analysis  and  cluster 
analysis  methods.  It  can  also  be  considered  as  a  non-orthogonal  factor  analysis  method. 


VISUAL  FEATURES  AND  HIDDEN  VARIABLES 
OF  FROTHS  IN  FLOTATION  OF  MINERALS 

Flotation  is  used  in  mineral  processing  industries  for  separation  of  grains  of  valuable  minerals  from  those 
of  side  minerals.  In  the  continuous  flow  flotation  cell  (Fig.  1),  air  is  introduced  into  the  agitated  suspension 
of  ore  and  water. 


14 


AIR  AIR 


Fig-1.  Flotation  cell 

The  desired  mineral  tends  to  adhere  to  air  bubbles  and  to  rise  to  the  froth  layer  that  is  removed  as  the 
product  of  the  cell,  while  the  main  part  of  the  slurry  exits  from  its  lower  part.  The  separation  of  minerals 
requires  that  the  desired  mineral  is  provided  with  a  water-repellent  property  which  is  produced  by  selective 
adsorption  of  chemicals  at  a  preceding  process  stage.  -  A  flotation  plant  comprises  always  a  high  number 
of  cells  in  the  form  of  a  network. 

The  process  operators  inspect  the  froth  by  eye,  in  order  to  make  observations  on  its  individual 
characteristics  or  general  appearance,  and  to  use  these  as  a  background  of  their  manual  control  actions. 
Various  types  of  froths  have  been  characterized  by  qualitative  terms,  like:  watery,  stiff,  polyhedral,  shiny, 
porridge  type,  flat,  stormy  etc.,  in  addition  to  characterization  by  colour.  Instrumental  methods  have  been 
developed  more  recently  for  imaging  and  more  quantitative  analysis  of  scenes  of  froth. 

Some  instrument  systems  that  consist  of  a  black-and-white  or  colour  video  camera  followed  by  computer 
have  been  tested  on  line  in  flotation  plants  [10,1 1],  They  are  reported  to  apply  to  classification  of  froths  or 
to  extraction  of  physical  features,  like  average  bubble  size,  size  distribution  and  shape  parameters  of  the 
bubbles,  speed  of  froth  and  colour  parameters.  On  the  basis  of  the  data  displayed,  the  user  may  decide  how 
to  use  such  information  e.g.  to  control  of  froth  or  flotation. 

The  methods  applied  for  textural  characterization  and  classification  of  froths  are  based  on  well-known 
statistical  signal  processing  techniques  like  histograms,  Fourier  transforms,  power  spectra  [12]  and  grey 
level  dependence  matrices  [13].  They  produce  either  phenomenological  or  artificial  features  for 
classification.  The  classification  is  carried  out  by  standard  statistical  techniques  or  by  use  of  soft  computing 
methods  like  machine  learning  or  neural  networks. 

Another  study  of  industrial  flotation  froths  indicated  i.a.  that  an  optimal  illumination  for  distinction  of  the 
froth  structure  may  differ  from  that  for  distinction  of  colours  [14].  It  was  shown  that  e.g.  in  a  bank  of 
apatite  flotation  cells,  the  colour  of  froth  changes  parallel  to  the  decrease  of  the  apatite  content  of  froth.  The 
froth  structure  was  a  central  point  of  interest  in  sphalerite  flotation,  in  which  a  collapse  of  froth  may 
sometimes  unexpectedly  occur.  In  addition  to  the  characterization  of  this  phenomenon  by  means  of 
quantities  which  can  be  evaluated  directly  from  the  black-and-white  image  information,  a  neurocomputing 
method  was  applied  [15];  for  a  PC  A  approach  to  the  same  process,  see  [16]. 

Process  experiments 

The  data  were  recorded  in  the  flotation  plant  of  Pyhasalmi  Mine  of  Outokumpu  Finnmines  Oy,  where  an 
ore  containing  sphalerite  and  pyrite  minerals  is  processed.  The  froth  surface  was  recorded  using  a 
monochrome  camera  once  every  20  seconds  for  about  30  minutes.  Originally,  the  process  was  in  the  steady 
state,  but  at  15  minutes  from  the  beginning  an  extra  amount  of  copper  sulphate  CuS04  was  fed  into  the 
conditioner  preceding  the  tested  cell,  and  the  response  was  recorded. 

Normally,  an  increase  in  CuS04  makes  sphalerite  more  floatable,  but  excessive  amounts  of  this 
conditioning  chemical  leads  to  an  abrupt  collapse,  or  “poisoning”,  of  the  froth  surface.  This  means  a  shut- 


15 


down  of  the  production  so  that  it  is  extremely  important  to  detect  the  collapse  as  fast  as  possible.  At  this 
experiment  the  poisoning  was  visually  seen  to  start  at  20  minutes  but  the  level  dropped  down  later, 
completely  at  27  minutes.  After  that  the  process  was  returned  to  normal  operation  by  an  extra  amount  of 
another  chemical. 

Figure  2  reveals  that  the  recognition  of  the  collapse  situation  can  be  based  on  the  froth  outlook  —  however, 
defining  the  features  characterizing  the  exceptional  situation  in  mathematical  terms  is  by  no  means 
straightforward. 


Fig.  2.  Typical  view  of  a  normal  froth  surface  (left),  and  of  a  collapsed  surface  (right) 


The  image  data  were  processed  off-line  using  the  Matlab  software.  The  GGHA  algorithm  was  applied  first 
by  using  only  the  intensity  histogram  information  of  the  images  in  the  data  vector  (Fig.  3).  No  external 
information  or  classification  knowledge  was  used  (see  [15]). 


Fig.  3. Typical  intensity  histograms  of  the  normal  surface  and  the  collapsed  one 


Next,  the  histogram  data  were  augmented  with  the  power  spectrum  data.  The  concentric  rings  on  the 
frequency  domain  image  were  regarded  as  single  data  units.  The  zero  frequency  component  was  eliminated 
and  base  10  logarithms  were  calculated,  to  make  the  spectral  components  with  largely  varying  numerical 
values  better  compatible.  In  this  way  the  spectral  data  were  presented  as  a  real-valued  vector  of  only  256 
elements  (see  Fig.  4). 


16 


Fig.  4.  Typical  power  spectra  of  a  normal  surface  and  a  collapsed  one 

The  algorithm  was  set  to  search  for  four  distinct  features  (N  =  4)  using  both  the  histogram  and  spectrum 
data.  Two  meaningful  features,  called  ‘typical  operation”  and  ‘Vate  of  poisoning”  were  found,  the  other  two 
having  no  interpretation  (see  Fig.  5). 


Fig.  5.  The  two  most  significant  of  the  extracted  features  (normalized) 


The  weight  for  the  feature  representing  the  poisoning  state  is  shown  in  Fig.  6  as  a  function  of  time.  The 
collapse  is  clearly  visible.  It  can  be  argued  that  the  algorithm  has  been  capable  of  automatically  extracting 
relevant  information  from  the  process;  it  must  be  recognized,  however,  that  the  features  only  reflect 
dependencies  between  data  units.  The  analysis  of  the  results  has  to  be  carried  out  by  a  domain  area  expert. 


Fig.  6.  Weight  for  poisoning  (image  index  60  corresponds  to  run  of  20  minutes) 


17 


IMAGING,  CHARACTERIZATION  AND  CONTROL 
OF  THE  "DRY  LINE"  IN  A  PAPER  MACHINE 

Raw  material  input  to  a  paper  machine  is  dilute  wood  fibre  pulp.  This  is  fed  onto  a  plane,  broad  wire  on 
which  it  settles  as  a  layer.  The  main  part  of  the  water  content  of  the  pulp  is  removed  through  a  multitude  of 
holes  in  the  wire,  as  it  transports  the  pulp  toward  the  pressing  and  drying  sections  of  the  machine. 

The  disappearance  of  the  liquid  water  from  the  surface  of  the  pulp  on  the  wire  is  manifested  by  a  "dry  line" 
(Fig.  7).  Its  location  and  form  have  always  been  a  special  object  of  the  operator's  interest,  although  an 
unaided  eye  finds  at  most  only  a  part  or  parts  of  it.  Substitution  of  the  human  eye  by  a  regular  or  opto- 
electric  camera  has,  as  of  yet,  not  improved  its  observation. 


Fig.  7.  Illustration  of  the  dry  line  monitoring  setup 

The  dry  line  can  be  reproduced  successfully  by  two  new  methods  which  take  the  optical  requirements  of 
illumination,  viewing  and  image  formation  of  the  dry  line  range  properly  into  account  [17,18],  They 
provide  the  video  camera  with  a  wire  view  which  consists  of  two  fields  of  different,  homogeneous 
luminosities,  one  preceding  and  the  other  following  the  dry  line,  while  such  bright  areas  are  not  produced 
which  would  effect  a  blinding  or  blooming  of  the  detector.  The  computer  which  receives  the  image  signal 
is  then  able  to  extract  the  complete  dry  line  by  means  of  an  edge  detection  algorithm,  as  the  borderline  of 
the  two  fields,  and  to  reproduce  it  on  a  graphical  display. 


Fig.  8.  Digitally  reproduced  dry  line  displayed  on  a  monitor. 


18 


Both  methods  have  been  installed  and  tested  in  industrial  paper  machines.  Fig.  8  shows  the  geometrically 
rectified  wire  image  together  with  the  digitally  extracted  dry  line,  displayed  repeatedly  in  the  control  room 
of  a  large  paper  board  machine  at  Imatra  mills  of  Stora  Enso  Company  in  Finland.  The  average  location  of 
the  dry  line  is  counted  and  shown  on  the  monitor  after  each  measurement,  and  also  brought  to  a  controller 
for  automatic  control.  -  The  methods  of  dry  line  measurement  and  control  have  been  patented  in  several 
countries. 

The  closed  loop  control  of  the  dry  line  location  is  accomplished  through  the  headbox  slice  which  regulates 
the  feed  of  pulp  onto  the  wire,  as  instructed  by  the  controller.  The  benefit  of  this  control  method  is  its  fast 
reaction  to  disturbance  inputs  entering  the  head  end  of  the  machine,  while  the  conventional  controllers 
which  rely  on  measurements  at  the  dry  end  of  the  machine  only,  are  handicapped  by  a  long  process  delay. 
Under  the  control  described,  the  dry  line  follows  fast  the  changes  of  setpoint,  and  keeps  the  dry  line  close  to 
its  setpoint,  both  in  presence  of  disturbances  and  during  transitions  between  different  product  grades 
[18,19,20]. 

Changes  of  the  form  of  dry  line  can  be  effected  by  one  or  more  of  the  slice  screws,  an  array  of  which  is 
available  for  adjustment  of  the  crosswise  distribution  of  the  feed  flow  of  pulp.  The  control  of  the  form  or 
profile  of  the  dry  line  requires  that  the  response  of  the  dry  line  to  the  change  of  setting  of  a  single  screw  is 
known.  Fig.  9  shows  the  experimentally  obtained  response  of  the  dry  line  to  the  adjustment  of  a  single 
actuator  [18].  The  response  is  similar  to  the  single  actuator  responses  of  the  dry  basis  weight  of  the  final 
product  reported  in  the  literature  by  several  authors.  It  appears  that  changes  in  the  latter  can  be  predicted  on 
the  basis  of  the  dry  line  measurement,  immediately  after  a  disturbance  enters  from  the  headbox. 


Fig.  9.  Typical  actuator  (location  57  in  cross  direction)  response  in  dry  line  profile 


Feature  extraction 

The  feature  extraction  methodology  that  was  discussed  above  can  be  applied  directly  to  analyse  the  dry  line 
profiles.  The  dry  line  vectors  of  length  128  determining  the  instantaneous  profile  outlook  were  used  as 
input  data  in  the  algorithm.  There  is  just  one  "cluster  center",  representing  the  average  position  and  profile 
of  the  dry  line;  four  additive  features  were  extracted  in  the  experiment  (see  Fig.  10).  No  a  priori 
information  about  the  process  was  utilized,  the  features  were  determined  by  the  statistical  dependencies 
between  dry  line  measurements  only. 

This  approach  may  reveal  some  underlying  structure  that  gives  valuable  insight  to  the  process  operator. 
Often  encountered  characteristic  variations  in  the  dry  line  profile  are  reflected  in  the  extracted  features; 
later  on,  it  is  up  to  the  human  experts  to  somehow  react  and  compensate  for  the  unwanted  phenomena.  For 
example,  note  that  the  last  feature  (lower  right  comer  in  Fig.  10)  can  probably  be  interpreted  in  terms  of 
variations  in  the  flow  around  the  actuator  location  32  (cf.  Fig.  9)  -  should  the  corresponding  slice  screw  or 
screws  perhaps  be  checked? 


19 


0.3 

0.2 

0.1 

0 

-0.1 

-0.2 


0.3 
0.2 
0.1 
0 

-0.1 
-0.2 

20  40  60  80  100  120 


20  40  60  80  100  120 


Fig.  10.  The  four  extracted  "dry  line"  features 


CONCLUSION 

Image  analysis  of  industrial  processes  is  a  new  field  of  technology.  Its  practical  implementations  are 
variable  depending  on  the  particular  process,  and  new  applications  keep  on  appearing.  Although  a 
generality  cannot  be  reached,  some  conclusions  can  be  made  on  the  basis  of  the  applications  elaborated 
above  and  of  other  cases  described  in  the  literature,  such  as  analysis  of  defects  on  surface  of  steel  plate  in 
rolling  mill  [21],  flame  in  the  combustion  chamber  of  a  boiler  [22],  cross-section  and  defects  of  wooden 
board  in  saw-mill  [23],  etc. 

Thus  it  has  turned  out  that  whenever  the  human  eye  and  mind  are  able  to  distinguish  and  define  such 
features  of  a  scene  which  characterize  the  material  observed  or  the  stage  of  its  processing,  these  can  be 
measured  also  by  physical  vision  systems,  even  quantitatively.  New  types  of  illumination  and  viewing  are 
then  often  needed,  in  order  to  establish  a  successful  imaging  of  the  moving  and  variable,  industrial  material. 
Even  if  such  features  have  not  been  defined  by  the  human  vision,  they  may  be  found  by  intelligent 
processing  of  the  abundant  data  provided  by  a  single  image  or  a  succession  of  images.  An  intelligent 
analysis  of  the  detected  features  may  also  be  needed,  whenever  it  is  not  known,  which  ones  of  the  available 
actuators  are  to  be  manipulated,  and  to  which  extent  each,  in  order  to  establish  a  successful  control  of  such 
material  properties  which  relate  to  those  features. 

ACKNOWLEDGEMENT 

The  authors  are  grateful  to  Mr.  Jouko  Bemdtson,  M.Sc.,  for  the  paper  machine  data  material.  This  research 
has  been  partly  financed  by  the  Academy  of  Finland. 

REFERENCES 

1.  A.  Basilevsky,  1994:  Statistical  Factor  Analysis  and  Related  Methods.  John  Wiley  &  Sons,  New  York. 

2.  F.  Palmieri,  J.  Zhu,  and  C.  Chang,  1993:  Anti-Hebbian  learning  in  topologically  constrained  linear  networks:  A 
tutorial.  IEEE  Trans,  on  Neural  Networks,4  (5),  748-761 . 

3.  P.  Foldiak,  1990:  Forming  sparse  representations  by  local  anti-Hebbian  learning.  Bio.Cybemetics,  64  (2),  165-170. 

4.  E.  Saund,  1995:  A  multiple  cause  mixture  model  for  unsupervised  learning.  Neural  Computation,  7, 51-71. 

5.  H.  Hyotyniemi,  1998:  Automatic  Structuring  of  Unknown  Dynamic  Systems.  In  Soft  Computing  in  Engineering 
Design  and  Manufacturing  (P.K.  Chawdhty,  R.  Roy,  and  R.K.  Pant,  R,  eds.),  Springer-Verlag,  London,  pp.  410- 
419.  Available  at  http://Saato014.hut.fi/Hyotyniemi/publications/97_wsc2.htm. 


20 


6.  H.  Hyotyniemi,  1998:  Structure  from  Data:  AI  Approaches  to  Systems  Modeling.  Proc.  8th  Finnish  AI  Conference 
STeP’98  (P.  Koikkalainen  and  S.  Puuronen,  eds.),  Finnish  Artificial  Intelligence  Society  (FAIS),  Helsinki,  pp.  31- 
40.  Available  at  http://Saato014.hut.fi/Hyotyniemi/publications/98_step_l  .htm. 

7.  H.  Hyotyniemi,  1996:  Constructing  non-orthogonal  feature  bases.  IEEE  Int.  Conf.  on  Neural  Networks  ICNN'96, 
Washington,  DC.  Available  at  http://Saato014.hut.fi/Hyotyniemi/publications/96_icnn.htm . 

8.  T.  Kohonen,  1995.  Self-Organizing  Maps.  Springer- Verlag,  Berlin. 

9.  S.  Haykin,  1994:  Neural  Networks  -  Comprehensive  Foundation.  Macmillan  College  Publishing,  New  York. 

10.  A.  Cipriano,  M.  Guarini,  R.  Vidal,  A.  Soto,  C.  Sepulveda,  D.  Mery,  and  H.  Briseno,  1998:  A  real  time  visual  sensor 
for  supervision  of  flotation  cells.  Minerals  Engineering,  11  (6),  489-499. 

11.  D.W.  Moolman,  C.  Aldrich,  J.S.J.  van  Deventer,  J.J.  Eksteen,  W.W.  Stange,  P.  Marais,  C.  Goodall,  and  R.S. 
Veitch,1995:  On-line  image  analysis  to  improve  industrial  flotation  plant  performance.  Preprints  of  the  8th  IFAC 
Int.  Symp.  on  Automation  in  Min.  Met.  Processing,  Sun  City,  South  Africa,  367-371. 

12.  D.W.  Moolman,  C.  Aldrich,  J.S.J.  van  Deventer,  and  W.W.  Stange,  1994:  Digital  image  processing  as  a  tool  for  on¬ 
line  monitoring  of  froth  in  flotation  plants.  Minerals  Engineering,  7, 1149-1164. 

13.  D.W.  Moolman,  C.  Aldrich,  J.S.J.  van  Deventer,  and  D.J.  Bradshaw,  1995:  The  interpretation  of  flotation  froth 
surfaces  by  using  digital  image  analysis  and  neural  networks.  Chemical  Engineering  Science,  50, 3501-3513. 

14.  A.J.  Niemi,  R.  Ylinen,  and  H.  Hyotyniemi,  1997:  On  characterization  of  pulp  and  froth  in  cells  of  flotation  plant. 
Int.  J.  of  Mineral  Processing,  51,  51-65. 

15.  H.  Hyotyniemi  and  R.  Ylinen,  1998:  Modeling  of  Visual  Froth  Data.  Preprints  of  the  IFAC  Symposium  on 
Automation  in  Mining,  Mineral  and  Metal  Processing  (J.  Heidepriem,  ed.),  International  Federation  of  Automatic 
Control,  pp.  309-314.  Available  at  http://Saato014.hut.fi/Hyotyniemi/publications/98_mmm.htm . 

16.  J.  Hatonen,  H.  Hyotyniemi,  G.  Bonifazi,  S.  Serranti,  F.  Volpe,  and  L.-E.  Carlsson,  1999:  Using  PCA  in  controller 
strategy  design  for  a  flotation  process.  Preprints  of  the  14lh  IFAC  World  Congress,  July  5-9,  Beijing,  P.R.  China. 

17.  A.J.  Niemi  and  C.  Backstrom,  1992:  Automatic  observation  of  dry  line  on  wire  for  wet  end  control  of  the  paper 
machine.  Proc.  of  Control  Systems  '92,  CPPA,  Canada,  pp.  261-265.  Also  in  Pulp  and  Paper  Canada,  95  (2):  27 
(1994),  T55-58. 

18.  A.J.  Niemi,  J.  Bemdtson,  and  S.  Karine,  1998:  Improved  wet  end  control  of  the  paper  machine.  Proc.  Of  Control 
Systems  '98  (September  1-3,  Porvoo,  Finland),  Finnish  Soc.  of  Automation,  371-378. 

19.  A.J.  Niemi,  J.  Bemdtson,  and  S.  Karine,  1999:  Feedback  control  of  the  dry  line  on  wire  of  paper  machine.  Preprints 
of  the  14th  IFAC  World  Congress,  July  5-9,  Beijing,  P.R.  China. 

20.  J.E.  Larsson,  T.  Gustafsson,  and  S.  Ronnback,  1998:  Paper  machine  dry  line  positioning  system.  Proc.  Of  Control 
Systems  '98  (September  1-3,  Porvoo,  Finland),  Finnish  Soc.  of  Automation,  355-362. 

21.  E.  Kiuru,  E.  Keranen,  and  T.  Piironen,  1993:  Improving  overall  performance  of  automatic  systems  for  metal 
surface  inspection.  Proc.  1st  Int.  Conf.  Measurement  and  Instruments  in  the  Metallurgical  Industry,  Shenyang,  P.R. 
China,  24-29. 

22.  J.  Hirvonen,  R.  Lilja,  K.  Ikonen,  and  J.  Nihtinen,  1996:  Image  processing  in  combustion  control.  Int.  Journal  of 
Pattern  Recognition  and  Artificial  Intelligence,  10  (2),  129-137. 

23.  M.  Pietikainen  and  L.F.  Pau  (eds.),  1996:  Machine  Vision  for  Advanced  Production.  Series  in  Machine  Perception 
and  Artificial  Intelligence  (Vol.  22),  World  Scientific  Publishing. 


21 


Progress  in  Japan's 

Intelligent  Manufacturing  Systems  Research  Program 

Yuji  Furukawa 

Tokyo  Metropolitan  University,  Minami-Osawa,  Hachioji,  Tokyo,  192-0364  Japan 
Email:  furukawa-vuii@c.metro-u.ac.ip 


ABSTRACT 

The  IMS  Program  aims  at  improving  the  circumstances  around  manufacturing  industries  and  developing 
the  next  generation  manufacturing  technologies  and  systems.  It  is  a  very  unique  and  unprecedented 
mechanism  supported  by  industry,  academia  as  well  as  governments,  challenging  to  undertake  industry-led 
international  collaborative  R&D  among  six  participant  regions;  Australia,  Canada,  the  EU,  Japan, 
Switzerland  (EFTA)  and  the  USA. 

The  IMS  Program  now  sees  it’s  selected  13  international  projects  implemented  and  tries  to  increase  the 
number  of  projects  more  than  a  hundred.  With  increasing  number  of  projects  and  proposals  actively 
proposed  by  member  regions,  this  initiative  will  fiirther  expand  its  effective  network  and  enhance  global 
manufacturing  cooperation  as  expected. 

Keywords:  IMS(Intelligent  Manufacturing  Systems);  mega  competition,  global  manufacturing, 
environmentally-conscious  manufacturing 


HISTORY  OF  IMS  (Figure  1) 

At  the  end  of  1989,  Japan  proposed  the  IMS  Program  as  a  framework  for  international  collaboration  by 
industry,  academia  and  governments  whose  aim  would  be  the  resolution  of  many  problems  shared  by  the 
world's  manufacturers.  In  response,  a  feasibility  study  began  in  1992  with  participants  from  Japan,  Europe, 
USA,  Australia  and  Canada.  The  final  report,  released  in  1994,  stated  that  "the  IMS  Program  is  feasible  and 
a  full-scale  Program  should  start  as  soon  as  possible."  April  1995  saw  ISC1  (the  first  meeting  of  the 
International  Steering  Committee)  and  the  announcement  of  the  IMS  Program's  official  start.  Since  then, 
some  projects  have  been  endorsed  and  are  steadily  underway.  In  January  1997,  the  EU  became  an  official 
member  of  the  IMS  Program.  The  six  current  participants  include  Australia,  Canada,  the  EU,  Japan, 
Switzerland  (EFTA)  and  the  USA.  In  November  1997,  South  Korea  is  also  participating  on  an  experimental 
basis. 

BACKGROUND  OF  IMS 

Recently,  manufacturing  industries  faced  radical  changes  in  their  business  environment.  First  is 
Globalization  of  Manufacturing.  Production  units  are  being  shifted  overseas  and  procurement  is  crossing 
international  borders.  Lack  of  uniform  standards  and  human-interface  difference  have  resulted  in  numerous 
problems.  Second  is  Changing  Market  Environment.  Consumer  needs  are  becoming  more  diverse,  and 
product  life-cycles  are  shrinking.  This  leads  to  creation  of  production  systems  that  are  flexible  enough  and 
to  shortened  R&D  cycles.  Third  is  Changing  Labor  Environment.  With  experienced  technicians  in  short 
supply  and  young  engineers  leaving  production,  retaining  talented  staff  requires  improvement  in  job 
content  and  the  production  environment.  Forth  is  Response  to  Environment  Issues.  It  has  become 
common  sense  that  recycling  and  efficient  use  of  nonrenewable  resources  are  essential  to  protect  and 
improve  the  global  environment.  These  problems  can  not  and/or  should  not  be  solved  by  a  single 
organization  or  country  because  they  have  become  so  big  in  scale  and  so  complicated.  It  seems  that  a  way 
must  be  found  to  solve  these  problems  by  international  collaboration  among  industries,  governments  and 
academia.  The  IMS  Program  is  the  first  trial  to  realize  this  collaboration  of  R  &  D  in  manufacturing. 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


22 


OBJECTIVES  OF  IMS 

Under  today's  globalized  economy,  every  private  company  is  forced  to  compete  against  others,  hence  it  has 
to  develop  more  competitive  products.  Therefore,  it  seems  difficult  to  collaborate  among  the  competing 
companies  together.  However,  looking  the  manufacturing  technology  development  carefully,  we  realize  a 
circulation  of  technological  knowledges,  that  is,  companies  are  competing  in  the  current  technological 
level,  however,  they  fall  in  cooperation  in  the  standardized  level,  as  can  be  seen  in  ISO  activities,  which  can 
be  nominated  as  a  post-competitive  knowledge  and  also  in  the  basic/enabling  level,  as  can  be  seen  in 
development  of  environmentally  conscious  combustion  engines  in  automobile  industry,  which  can  be  called 
as  a  pre-competitive  knowledge.  The  IMS  Program  aims  to  develop  many  independent  projects  which 
remain  in  either  in  post-competitive  or  in  pre-competitive  level. 


Region 


Format  Start 


Endorsed  Projects 

13  HARMONY 
12  MISSION 
II  HUTOI* 

10  INCOMI'RO 
9  3DS 
*  11*7 

7  IIUMACS 
6  KI'O 
S  MM!  IS 
4  ONOSIS 
3  HMS 
2  NGMS 
I  GLOBKMAN2I 

Proposed  by  Japan 

i  J  Tripartite  meetings  J 
/ of  Europe,  Japan  / 
/and  the  USA _ J 


1990  1992  1994  1996  1998 

Fiscal  Year 


1 

■ 

/  Others 

South  Korea  (Tentative) 

EU 

Switzerland  (EFTA) 


Australia,  Canada,  Japan  and  the  USA 


International  Full-scale  Program 


Number  of  projects 


j-  Targe!-* 
ojects 

200 


2000 


2002 


2004 


Fig.  1.  Evolution  of  the  IMS  Program. 


CURRENT  STATUS  OF  IMS 

The  current  status  of  the  IMS  projects  as  of  May  1999  is  as  follows: 


Projects  fully  endorsed  13 

Projects  conditionally  endorsed  1 

Proposal  under  revision  4 

Abstracts  endorsed  30 

Abstracts  under  review  or  revision  1 1 

Outline  proposals  5 


During  the  period  from  ISC6  (the  sixth  meeting  of  the  International  Steering  Committee)  held  in  November 
1997,  to  1  May  1999,  41  abstracts  were  submitted  from  the  participating  regions,  mostly  from  EU.  Those 
are  currently  awaiting  development  into  full  proposals.  Five  outline  proposals  are  waiting  for  development 
into  abstracts  and  subsequently,  into  full  proposals.  These  have  been  circulated  among  the  participating 
regions  for  appropriate  action. 

Figure  2  shows  the  status  of  participation  made  by  14  projects  for  which  full  proposals  were  submitted  to 
date.  Those  projects  include  13  endorsed  projects  and  1  project  which  is  awaiting  endorsement.  For  the  full 
proposals,  origins  were  identified  with  the  domiciles  of  their  respective  first  ICPs. (International 


23 


Coordinating  Partners).  The  numbers  of  participating  organizations  from  all  regions  are  418  in  total.  Of 
those  participating,  177  organizations  are  from  academia  and  241  organizations  are  from  private 
corporations.  Each  region  has  several  participants.  Figure  2  also  shows  a  comparison  of  the  regions. 

ISO 
160 
140 
c  120 

t 

«  100 
O  80 

I  60 

40 
20 
0 

Japan  USA  Canada  Australia  Ch  EU 

Regions 

Fig.  2.  Current  Project  Status  and  Comparison  of  the  Different  Participating  Countries. 


Although  Australia,  Canada  and  Switzerland  enjoy  a  fair  share,  the  originating  region  is  heavily  weighted 
in  favor  of  the  EU  and  Japan,  reflecting  the  number  of  countries,  which  consist  of  the  former  region  and  the 
fact  that  the  latter  has  had  a  head  start.  The  balance  between  academia  and  industry  is  now  becoming  better, 
especially  in  Japan  and  the  EU,  the  portfolio  in  those  regions  shows  that  the  IMS  Program  is  an  industrial- 
led  program  aimed  at  both  pre-competitive  and  post-competitive  research  in  the  manufacturing  industry. 

The  commitments  made  by  each  project  were  estimated  by  valuing  each  person  year  at  US  $100,000  and 
adding  to  that  any  other  costs  as  stated.  In  Figure  2,  there  are  the  estimated  total  commitments  by  all  the  14 
projects  for  which  full  proposals  were  submitted  to  date.  The  total  commitments  are  US$  240  million.  We 
are  expecting  big  growth  in  the  future,  because  a  large  number  of  abstracts  are  ready  for  development  into 
full  proposals. 


TECHNICAL  TREND  OF  ON-GOING  PROJECTS 

With  a  view  to  providing  a  general  picture  of  how  the  IMS  projects  are  covering  various  areas  of 
manufacturing  technology,  all  the  full  projects  and  abstracts  in  the  current  portfolio  were  roughly  sorted 
into  a  number  of  categories. 

Figure  3  shows  their  distribution  over  the  topics  suggested  by  TTWG  (Technical  Themes  Working  Group), 
which  is  one  of  the  most  important  committees  under  ISC  and  led  by  Y.  Furukawa,  in  their  report  submitted 
to  ISC5  (held  in  May  1997)  as  Key  Seed  Technologies  for  IMS.  In  order  to  provide  a  general  picture  in  the 
simplest  form,  each  of  the  projects  and  abstracts  was  identified  with  a  single  topic  for  simplicity's  sake. 

Assuming  that  the  full  projects  indicate  the  current  trend,  the  most  popular  themes  are  understandably 
enough  "Advanced  processing  and  assembling  technologies",  "Virtual  manufacturing"  and 
"Advanced  design  technologies"  at  present  as  well  as  in  the  foreseeable  future.  On  the  other  hand, 
assuming  that  the  abstracts  indicate  the  future  trend,  "Remote  monitoring  and  control  technologies" 
seems  to  be  an  expanding  topic.  There  are  no  full  projects  addressing  "STEP  and  CALS"  and  a  small 
number  of  projects  will  cover  this  topic  considering  the  abstracts  submitted  to  date. 


24 


Virtual  Manufacturing 
STEP  CALS 
Monitoring  &  Control 
New  Materials 
Knowledge  systematisation 
Processing  &  Assembling 
Design  technologies 
Computer  &  Communication 


0  2  4  6  8  10  12 

Fig.  3.  Project  Categories  in  the  IMS  Program. 


Figure  4-1  indicates  all  the  projects  classified  by  size,  i.e.  number  of  partners.  Projects  appear  to  have 
become  smaller  in  size  as  the  average  number  of  partners  decreased  from  39  in  95  series  to  26  in  97  series. 
Future  projects  are  likely  to  be  even  smaller  as  the  abstracts  develop  into  full  proposals. 


On  the  other  hand,  Figure  4-2  shows  the  distribution  of  project  size  in  terms  of  cost  for  full  proposals  in  95, 
96  and  97  series,  as  well  as  for  the  20  recent  abstracts  on  which  data  were  available.  Project  size  has 
undergone  a  steady  decrease  and  will  continue  to  decrease  in  terms  of  cost  as  well. 


Note  should  be  taken,  however,  that  the  forecast  may  not  match  the  reality  as  data  from  abstracts  are  not 
reliable  as  most  of  them  are  at  too  premature  a  stage  of  formation. 


Fig.  4-1.  Project  Size  by  Number  of  Partners. 


25 


Fig.  4-2.  Project  Size  by  Cost. 


FUTURE  DEMANDS  OF  MANUFACTURING  INDUSTRIES 

Manufacturers  must  meet  not  only  various  and  varying  demands  from  users  at  a  reasonable  price  but  the 
chief  requirement  now  is  to  minimize  adverse  environmental  effects  arising  from  their  activities.  It  is  not 
always  possible  however,  for  manufacturers  to  meet  all  demands  including  those  of  the  environment. 

By  taking  the  foregoing  into  consideration,  demands  on  manufacturers  can  be  summarized  as  follows: 
provide  users  with  products  that  best  meet  their  needs  including  servicing  such  products;  have  the  highest 
possible  quality;  have  the  lowest  possible  cost;  minimize  the  load  on  the  natural  environment.  How  to  meet 
these  demands  is  the  basic  question  for  manufacturers  to  answer.  Demands  arising  from  these  changes  in 
the  manufacturing  environment  are  summarized  and  the  fundamental  measures  are  considered  as  follows: 

•  Mega-competition 

-  Overseas  implant  for  manufacturing  low  value-added  products.  Concentrating  efforts  on 
competitive  products.  De-facto  standards.  Optimal  automation.  High-speed  manufacturing. 

•  Globalized  operation 

-  Optimal  resource  and  effort  distribution.  Material  and  component  acquisition  and  out-sourcing. 

•  Global  environment  and  material  recycling 

-  Design  for  recyclable  and  clean  products.  Clean  manufacturing.  Energy  and  resource  efficient 
manufacturing.  Disassembling  and  sorting-out  technology.  Waste  collection  systems. 

•  Manufacturing  culture  and  workers 

-  Enhanced  awareness  of  manufacturing  activities.  Education/training  of  manufacturing  workers. 
Professional  qualification. 

FUTURE  TECHNICAL  AREAS  OF  IMS 

The  TTWG  (Technical  Themes  Working  Group  chaired  by  Y.  Furukawa)  surveyed  and  proposed  possible 
technical  themes  taking  these  demands  into  consideration.  The  report  summarizes  the  problems  associated 
with  many  industrial  sectors  and  products  accounting  for  both  product  variety  and  production  systems,  and 
proposes  R&D  themes  in  manufacturing  technology,  systems  and  possible  management  ones  as  shown  in 
Figure  5.  These  themes  should  be  referenced  when  a  new  candidate  research  consortium  is  formed. 

CONCLUSION 

The  present  paper  has  introduced  the  state  of  the  art  of  the  IMS  Program  which  is  now  conducted  by  the  six 
participating  countries/areas  aimed  at  solving  problems  encountered  in  manufacturing  in  these  times  of  a 
global  economy.  The  paper  has  stressed  the  Program  technologies,  introducing  the  state  of  current  projects 
as  well  as  forecasting  future  directions.  It  is  hoped  that  other  authors  will  refer  to  this  report  to  recognize 
the  position  of  their  individual  research  compared  to  the  worldwide  trend  of  manufacturing  projects. 


to  medium  lot 


r 


26 


27 


Analysis  of  Processes  and  Large  Data  Sets 
by  a  Self-Organizing  Method 

Teuvo  Kohonen 

Helsinki  University  of  Technology,  Neural  Networks  Research  Centre, 
P.O.  Box  2200,  FIN-02015  HUT,  Finland 


ABSTRACT 

Frequently  one  must  deal  with  natural  processes  and  data  for  which  no  known  models  can  be  derived  from 
classical  systems  theory.  Examples  are  discrete  measurements  from  distributed  processes  that  are  not 
identifiable,  and  data  generated  by  human  actions  such  as  speech,  text,  and  financial  time-series.  A  recent 
solution  is  that  relationships  between  the  elements  are  described  by  nonlinear  functional  expansions  called 
"neural  networks."  The  most  familiar  neural-network  models  make  use  of  supervised  learning,  which 
means  that  the  data  used  for  identification  must  be  verified,  validated,  and  preclassified.  Such  data, 
however,  is  very  expensive  and  sometimes  even  impossible  to  acquire.  A  different  approach  altogether  is 
unsupervised  learning  that  uses  raw  data,  usually  available  on  mass.  In  this  presentation,  the  most 
widespread  unsupervised-leaming  method,  the  Self-Organizing  Map  (SOM)  algorithm  is  described.  The 
central  idea  in  this  algorithm  and  in  self  organization  in  general,  is  to  use  a  large  number  of  relatively 
simple  and  structurally  similar,  interacting,  statistical  submodels.  Each  submodel  describes  only  a  limited 
domain  of  observations,  but  since  the  submodels  can  communicate,  they  can  mutually  decide  what  and  how 
large  a  domain  belongs  to  each  submodel.  By  virtue  of  such  collective  interactions  it  becomes  possible  to 
span  the  whole  data  space  nonlinearly,  thereby  minimizing  the  average  overall  modeling  error.  As  the  SOM 
implements  a  characteristic  nonlinear  projection  from  the  input  space  to  a  visual  display,  it  can  be  used, 
e.g.,  to  reveal  process  states  that  otherwise  would  escape  notice.  Applications  to  industry  and  "data  mining" 
in  general  are  surveyed.  A  recently  implemented  curiosity  of  self-organizing  maps,  the  mapping  of  all 
electronically  available  patent  abstracts  in  the  world  onto  a  visual  display  will  also  be  reported. 


INTRODUCTION 

With  the  increasing  computing  power,  it  has  become  possible  to  process  and  classify  masses  of  natural 
data,  such  as  statistical  information,  images,  speech,  as  well  as  other  kinds  of  signals  and  measurements 
coming  from  very  different  sources.  Such  tasks  occur  in  industry,  remote  sensing,  medicine,  finance,  and 
natural  sciences,  to  mention  only  a  few  main  fields.  For  financial,  medical,  administrative,  and  other 
databases,  one  needs  efficient  tools  for  visualization,  prediction,  clustering,  and  profiling.  In  industrial 
problems,  it  is  essential  to  build  empirical  data  based  models  of  complex  systems  in  order  to  be  able  to 
monitor,  predict,  diagnose  faults,  and  control  the  systems. 

Natural  information  has  properties  that  have  not  been  taken  into  account  in  the  classical  mathematical 
statistics,  not  even  in  traditional  multivariate  analyses.  The  dimensionalities  of  such  data  tend  to  be 
immense,  a  priori  statististics  are  not  available,  parametric  density  functions  cannot  be  found,  and  the 
mutual  statistical  dependencies  between  the  data  elements  are  usually  nonlinear  and  dynamic. 

In  the  early  1980s  a  new  line  of  computational  approaches  based  on  simple  models  of  biological  neurons 
was  launched.  The  insight  was  that  although  nonlinear  and  dynamical  statistical  descriptions  are  not 
available  in  analytical  form,  the  intrinsic  features  of  the  observations  and  their  interrelations  can 
nonetheless  be  learned  from  the  input  and  output  data  using  a  great  number  of  simultaneously  cooperating 
submodels.  This  approach  was  not  possible  until  the  computers  became  so  effective  that  high-dimensional 
submodels,  which  learned  their  structures  from  the  collective  interactions  between  the  data  elements,  could 
be  set  up.  There  have  also  existed  attempts  to  develop  special  massively  parallel  computers  for  such 
artificial  neural  networks,  but  at  least  for  the  time  being  the  hardware  has  not  brought  about  any 
breakthroughs  in  natural  computing. 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


28 


Among  the  many  neural-network  architectures  and  algorithms,  the  Self-Organizing  Map  (SOM)  [1]  is  in  a 
special  position,  because  it  is  able  to  form  abstract,  but  ordered  images  of  large  and  often  high-dimensional 
data  sets.  It  converts  complex,  nonlinear  statistical  relationships  between  high-dimensional  data  elements 
into  simple  geometric  relationships  between  points  on  a  low-dimensional  display.  As  it  thereby  compresses 
information  while  preserving  the  most  important  topological  and  metric  relationships  of  the  primary  data 
elements  on  the  display,  it  may  be  thought  to  produce  some  kinds  of  abstraction.  These  two  aspects, 
visualization  and  abstraction,  can  be  utilized  in  a  number  of  ways  in  complex  tasks  such  as  process 
analysis,  machine  perception,  control,  and  communication. 


THE  BASIC  PRINCIPLE  OF  THE  SELF-ORGANIZING  MAP  (SOM) 

The  self-organizing  process  may  be  realized  in  any  set  of  elements,  schematically  illustrated  in  Figure  1, 
where  only  a  few  basic  operational  conditions  are  assumed.  For  simplicity,  let  the  elements  (often  signified 
as  ‘Vieurons")  form  a  regular  planar  array  and  let  each  element  represent  a  set  of  numerical  values  Mj  which 
we  call  a  model.  We  further  assume  that  each  model  is  modified  by  the  messages  the  element  receives. 


OOOOv- 

-0kD"©"0— 

Q-QO-tSH— 

-oo-oo- 

0-00-0 . - 


Fig.l.  A  self-organizing  model  set.  An  input  message  A  is  broadcast  to  a  set  of  models  ,  of  which  Mc 
best  matches  to  A  according  to  some  criterion.  All  models  that  lie  in  the  vicinity  of  Mc  (larger 
circle)  improve  their  matching  with  A.  Note  that  Mc  differs  from  one  message  to  another. 


Let  there  exist  some  mechanism  by  which  an  ingoing  message  A,  a  set  of  parallel  signal  values,  can  be 
compared  with  all  models  Mj.  It  is  customary  to  speak  of  “competition"  between  the  models,  when  they 
receive  common  input,  and  the  model,  whose  parameters  are  fittest  to  this  input,  is  selected  for  the  fiirther 
steps  of  the  process.  This  element  is  called  the  “winner"  denoted  by  Mc.  Another  requirement  for  self¬ 
organization  is  that  the  models  shall  be  modified  only  in  the  local  vicinity  of  the  winner(s)  and  that  all  the 
modified  models  shall  then  resemble  the  prevailing  message  better  than  before  this  step. 

When  the  models  in  the  neighborhood  of  the  winner  are  made  to  resemble  the  prevailing  message  A  better, 
they  also  tend  to  mutually  become  more  similar,  i.e.,  the  difference  between  all  models  in  the  neighborhood 
of  Mc  are  smoothed.  Different  messages  at  different  times  affect  different  parts  of  the  set  of  models,  and 
thus  the  models  A//,  after  many  learning  steps,  start  to  acquire  values  that  relate  to  each  other  smoothly  over 
the  whole  array,  in  the  same  way  as  the  various  messages  A  in  the  Signal  space11  do;  in  other  words,  maps 
related  topologically  to  the  set  of  messages  start  to  emerge  as  can  be  proven  mathematically  [2] . 


29 


These  three  subprocesses  —  broadcasting  of  the  input,  selection  of  the  winner,  and  adaptation  of  the 
models  in  the  spatial  neighborhood  of  the  winner  —  seem  to  be  sufficient,  in  the  general  case,  to  define  a 
self-organization  process  that  then  results  in  the  emergence  of  the  topographically  organized  “maps“. 

The  SOM  usually  consists  of  a  2-D  regular  grid  of  nodes.  The  SOM  algorithms  described  below  compute 
the  models  so  they  optimally  describe  the  domain  of  (discrete  or  continuously-distributed)  observations. 
The  models  are  organized  in  a  meaningful  two-dimensional  order  such  that  similar  models  become  closer 
to  each  other  in  the  grid  than  the  more  dissimilar  ones.  In  this  sense  the  resulting  SOM  is  also  a  similarity 
graph ,  and  a  clustering  diagram,  too.  Its  computation  is  anonparametric,  recursive  regression  process. 


THE  INCREMENTAL-LEARNING  SOM  ALGORITHM 

In  the  majority  of  practical  applications,  the  input  messages  X  are  represented  by  sets  of  values  that 
constitute  real  vectors  x.  Similarly,  the  models  Mi  are  represented  as  real  vectors  m, .  Regression  of  a  set 

of  model  vectors  m;  e  9?"  into  the  space  of  observation  vectors  x  e  9?"  is  often  made  by  the  following 
sequential  process,  which  takes  care  of  that  the  resulting  models  will  become  ordered: 

m,-(t  + 1)  =  m;(f)  +  hc{x)J(x(t)  -  m,-(f)  ,  1. 

where  t  is  the  sample  index  of  the  regression  step,  whereupon  the  regression  is  performed  recursively  for 
each  presentation  of  a  sample  of  x  =  x(t) .  Index  c  (“winner")  is  defined  by  the  condition 

V/,  ||x(0 -mc(/)||  <  ||x(0-m,-(0H  •  2. 

Here  hc(x)  t  is  called  the  neighborhood  function,  and  it  is  like  a  smoothing  kernel  that  is  time-variable  and 

its  location  depends  on  condition  (2).  It  is  a  decreasing  function  of  the  distance  between  the  ith  and  cth 
models  on  the  map  grid.  The  norm  is  usually  assumed  as  Euclidean. 


The  neighborhood  function  is  often  taken  to  be  the  Gaussian 


K(.x\i  =<x(f)exp 


n2 


2a  2  (0 


3. 


V  ^  \l)  ) 

where  0  <  a(t )  <  1  is  the  learning-rate  factor,  which  decreases  monotonically  with  the  regression  steps, 
r;-  e  91 2  and  rc  e  91 2  are  the  vectorial  locations  in  the  display  grid,  and  a(f)  corresponds  to  the  width  of 
the  neighborhood  function,  which  is  also  decreasing  monotonically  with  the  regression  steps. 


A  simpler  definition  of  hc(x)j  is  the  ‘bubble  neighborhood1*  defined  as  in  the  following:  hc(xy  =  a(t)  if 
||ri-rc||  is  smaller  than  a  given  radius  around  node  c  (whereupon  this  radius  is  a  monotonically 
decreasing  function  of  the  regression  steps,  too),  but  otherwise  /ic(xy  =  0 .  In  this  case  we  shall  call  the  set 
of  nodes  that  lie  within  the  given  radius  the  neighborhood  set  Nc  . 


Some  mathematicians  may  be  more  familiar  with  the  so-called  ‘principal  curves" of  Hastie  and  Stuetzle  [3] 
and  see  a  relationship  between  them  and  the  SOM.  However,  the  SOM  was  introduced  eight  years  earlier 
than  the  ‘principal  curves."  It  can  be  computed  much  more  conveniently  and  effectively  than  the  latter. 
There  are  also  other  differences,  for  instance,  the  SOM  can  be  generalized  in  many  ways,  which  are  not 
possible  for  the  principal  curves. 

Another  principal  alternative  to  the  SOM  is  the  generative  topological  mapping  (GTM)  [4],  in  which  the 
mapping  directly  tends  to  preserve  the  topological-metric  relations  on  the  output  array.  It  has  turned  out, 
however,  that  numerous  shortcut  computations  can  be  applied  to  make  very  large  SOMs,  while  these 
methods  are  not  applicable  to  the  GTM. 

Due  to  the  many  stages  in  the  development  of  the  SOM  method  and  its  variations,  there  is  often  useless 
historical  ballast  in  the  computations. 


30 


For  instance,  an  old  ineffective  principle  is  random  initialization  of  the  model  vectors  m( .  Random 
initialization  was  originally  used  to  show  that  there  exists  a  strong  self-organizing  tendency  in  the  SOM,  so 
that  the  order  can  even  emerge  when  starting  from  a  completely  unordered  state,  but  this  need  not  be 
demonstrated  every  time.  On  the  contrary,  if  the  initial  values  for  the  model  vectors  are  selected  as  a 
regular  array  of  vectorial  values  that  lie  on  the  subspace  spanned  by  the  eigenvectors  corresponding  to  the 
two  largest  principal  components  of  input  data,  computation  of  the  SOM  can  be  made  orders  of  magnitude 
faster,  since  (i)  the  SOM  is  then  already  approximately  organized  in  the  beginning,  (ii)  one  can  start  with  a 
narrower  and  even  time-constant  neighborhood  function  and  smaller  learning-rate  factor. 

Many  computational  aspects  like  this  and  the  selection  of  proper  parameter  values  have  been  discussed  in 
the  software  package  SOMPAK  [5],  as  well  as  the  book  [1], 

More  General  SOMs 

Above  the  models  consisted  of  ordered  sets  of  real  numbers  (feature  values  or  descriptors),  regarded  as  real 
vectors  in  the  Euclidean  space.  The  same  philosophy,  however,  applies  to  many  other  entities  that  are  then 
ordered  on  a  SOM  array.  For  instance,  the  models  can  be  strings  of  symbols,  filters,  operators  with  a  finite 
number  of  parameters,  or  manifolds  in  the  space  defined,  e.g.,  by  a  set  of  basis  vectors.  For  the  construction 
of  an  ordered  map,  the  following  requirements  are  sufficient: 

1 .  There  must  be  definable  a  distance  measure  between  any  two  items  (input  items  and/or  models),  on  the 
basis  of  which  any  input  item  can  be  compared  with  all  the  model  items,  and  the  “winner“is  identified. 

2.  The  models  must  be  updatable,  such  that  the  new  value  of  each  model  in  the  “neighborhood"  has  a 
smaller  or  equal  distance  from  the  input. 

A  yet  more  general  SOM  is  obtained  when  no  distance  measure  is  definable  between  the  input  items. 
Assuming  that  there  exists  a  set  of  possible  models,  and  a  fitness  function  between  each  model  and  each 
input  can  be  defined,  the  models  can  be  ordered  according  to  their  functional  similarity  [6], 


THE  BATCH  VERSION  OF  THE  SOM 

The  incremental  regression  process  defined  by  (1)  and  (2)  can  often  be  replaced  by  the  following  batch 
computation  version  of  the  SOM,  which  is  significantly  faster  and  does  not  require  specification  of  any 
learning-rate  factor  a(t) . 

Assuming  that  the  convergence  to  some  ordered  state  is  true,  we  require  that  the  expectation  values  of 
m;V  +  1)  and  m;  (t)  for  t  — »  ° »  must  be  equal,  even  if  hci(t)  were  then  selected  nonzero.  In  other  words, 

in  the  stationary  state  the  values  m  ■  must  satisfy  the  equilibrium  equation 

v*.  EiK(x),/(^-m-)}  =  0  •  4. 

In  the  special  case  where  we  have  a  finite  number  (batch)  of  the  \(t)  with  respect  to  which  (4)  has  to  be 
solved  for  the  m* ,  and  hc(xy  represents  the  kernels  used  during  the  last  phases  of  the  learning  process,  we 
can  write  Equation  4.  as 

*  £ /  hc(xyx(t) 

mi  -  •  5. 

A  hc(x)J 

This,  however,  is  not  yet  an  explicit  solution  for  m  • ,  because  the  subscript  c(x)  on  the  right-hand  side  still 

depends  on  x(t)  and  all  the  mj .  The  way  of  writing  (5),  however,  allows  us  to  apply  the  contractive 
mapping  method  known  from  the  theory  of  nonlinear  equations:  starting  with  even  coarse  approximations 
for  the  m * ,  (2)  is  first  utilized  to  find  the  indices  c(x)  for  all  the  x(r) .  On  the  basis  of  the  approximate 
K(x),i  values,  the  improved  approximations  for  the  m*  are  computed  from  (5),  which  are  then  applied  to 


31 


(2),  whereafter  the  computed  c(x)  are  substituted  into  (5),  and  so  on.  The  optimal  solutions  m*  are  usually 
obtained  in  a  few  iteration  cycles,  after  the  discrete-valued  indices  c(x)  have  settled  down  and  are  no  longer 
changed  in  further  iterations.  This  procedure  is  called  the  Batch  Map  principle. 

An  even  simpler  Batch  Map  principle  is  obtained  if  hc (xy  is  defined  in  terms  of  the  neighborhood  set  Nc  . 
Further  we  need  the  concept  of  the  Voronoi  set.  It  means  a  domain  Vj  in  the  x  space,  or  actually  the  set  of 

those  samples  x(f)  that  lie  closest  to  m* .  Let  us  recall  that  we  defined  N{  as  the  set  of  nodes  that  lie  up  to 
a  certain  radius  from  node  i  in  the  array.  The  union  of  Voronoi  sets  Vj  corresponding  to  the  nodes  in  Nt 
shall  be  denoted  by  t/,- .  Then  (5)  can  be  written 

*  ^x(Q6t/,- 

"  n(Ui) 

where  n(Ui)  means  the  number  of  samples  x(t)  that  belong  to  Uj . 


Notice  again  that  the  Uj  depend  on  the  m  * ,  and  therefore  (6)  must  be  solved  iteratively.  The  procedure 
can  be  described  as  the  following  steps: 

1 .  Initialize  the  values  of  the  m*  in  some  proper  way.  (Even  random  values  for  the  m*  will  usually  do.) 

2.  Input  all  the  \(t),  one  at  a  time,  and  list  each  of  them  under  the  model  m*  that  is  closest  to  x(t) 
according  to  (2). 

3.  Let  Uj  denote  the  union  of  the  above  lists  at  model  m*  and  its  neighbors  that  constitute  the 
neighborhood  .  Compute  the  means  of  the  vectors  x(f)  in  each  Uj ,  and  replace  the  old  values  of 

m-  by  the  respective  means. 

4.  Repeat  from  2  a  few  times  until  the  solutions  can  be  regarded  as  steady. 


For  the  case  in  which  neighborhood  sets  Nt  are  used, 

*  X  jsNj  nj  xj 

m;  =  v - ’ 

ZjeNnj 


A  further  acceleration  of  computation  results  if  one  notes  that  for  the  different  nodes  i,  the  same  addends 
occur  a  great  number  of  times.  It  is  advisable  to  first  compute  the  mean  Xj  of  the  x(f)  in  each  Voronoi  set 


Vj  and  then  weight  it  by  the  number  rtj  of  samples  in  Vj  and  the  neighborhood  function.  Now  we  obtain 


m, 


T,jnjhji 


8. 


where  the  sum  over  j  is  taken  for  all  units  of  the  SOM.  A  convergence  and  ordering  proof  of  the  Batch  Map 
has  been  presented  in  [7], 


There  is  a  Matlab  SOM  Toolbox  program  package  available  on  the  Internet  at  the  address 
http://www.cis.hut.fi/projects/somtoolbox/,  which  involves  the  Batch  Map  method. 


A  BRIEF  OVERVIEW  OF  SOM  APPLICATIONS 

The  four  most  promising  application  areas  of  the  SOM  are: 

•  analysis  and  control  of  industrial  processes  and  machines 

•  various  tasks  in  telecommunications 

•  exploratory  data  analysis  and  knowledge  discovery  in  databases  (KDD) 

•  biomedical  analyses  and  applications. 


32 


Analysis  of  Processes  and  Machines 

An  industrial  plant  or  complex  machine  is  traditionally  described  in  terms  of  physical,  chemical,  or  other 
state  variables,  which  are  usually  related  in  a  highly  nonlinear  way.  The  system  model,  which  is  usually 
distributed,  may  then  not  be  identifiable  from  measurements.  Nonetheless  there  may  exist  much  fewer 
characteristic  states  or  state  clusters  in  the  system  that  determine  its  general  behavior  and  are  somehow 
reflected  in  the  measurements.  As  the  SOM  is  a  nonlinear  projection  method,  such  characteristic  states  or 
clusters  can  often  be  made  visible  in  the  self-organizing  map,  without  explicit  modeling  of  the  system. 

Consider  a  system  from  which  several  discrete  measurements  relating  to  the  system  itself  as  well  as  to  its 
environment  are  taken.  These  measurements  and  the  values  of  the  control  variables  may  be  regarded  to 
constitute  the  data  vector  x  used  in  the  training  of  the  SOM.  The  scale  of  each  variable  can  be  normalized 
so  that  either  the  maximum  and  minimum  of  each  variable,  respectively,  are  equal,  or  the  variance  of  every 

variable  is  the  same.  Assume  that  the  model  vectors  are  m,  =  [p;1 ,  p;2 , . . . ,  ]T .  The  component  plane  j  of 
the  SOM  is  defined  as  the  array  of  values  (i,y  that  represent  the  y'th  components  of  all  the  model  vectors 
m, .  The  component  plane  can  be  displayed  as  an  array  of  squares  in  the  same  format  as  the  SOM  array, 
colored  with  shades  of  gray  or  pseudocolors  according  to  the  values  | Xy  . 

A  practical  example  of  the  use  of  the  SOM  is  shown  in  Figure  2,  where  a  power  transformer  has  been 
analyzed  [8].  In  this  example,  the  states  relating  to  the  measurements  of  ten  variables  (voltages,  currents, 
temperatures,  hydrogen  content  of  the  insulating  coolant)  have  been  followed  during  a  period  of  24  hours. 
The  trajectory  has  been  drawn  on  the  component  plane  that  displays  the  load  current  of  the  transformer. 
Dark  gray  tones  correspond  to  small  and  light  tones  to  heavy  loads,  respectively.  It  can  be  seen  that  the 
operating  point  has  moved  from  dark  to  light  areas  and  back  again  corresponding  to  the  daytime  operation 
of  the  system,  when  the  load  was  switched  on  and  off,  respectively. 


Fig.  2.  Each  small  square  represents  the  value  of  one  component  j  of  some  model  vector  m;- ,  in  gray  scale. 
The  trajectory  drawn  in  white  describes  the  sequence  of  “winner"  units  on  the  SOM,  in  response  to 
a  sequence  of  input  vectors  (x(t)}  taken  over  some  period  of  time.  This  picture  describes  the  load 
current  (white:  high,  black:  low),  whereby  its  values  can  be  read  as  a  function  of  time  from  the 
gray-shade  values  along  the  trajectory.  The  form  of  the  trajectory  is  defined  by  the  sequence  of 
states  of  the  transformer,  as  a  function  of  switching  the  load  on  and  off,  during  24  h  of  operation. 


In  addition  to  visualizing  the  normal  operating  conditions  by  the  SOM,  it  would  be  desirable  to  be  able  to 
visualize  fault  states,  too.  The  main  problem  thereby  is  where  to  get  abnormal  but  still  typical 
measurements  from,  in  order  to  use  them  in  the  training  of  the  SOM.  The  faults  may  be  rare  and  true 


33 


measurements  thus  not  available.  In  some  cases,  fault  situations  can  be  produced  during  the  development 
and  testing  of  the  equipment,  but  especially  in  the  case  of  industrial  processes  and  big  machines,  production 
of  a  sufficient  number  of  severe  faults  might  be  too  costly.  Then  the  faults  must  be  simulated,  and  the 
simulated  input  vectors  x  used  for  training. 

Semi-automatic  Control  of  Industrial  Processes 

Traditionally,  in  order  to  be  able  to  control  a  process,  a  global  system  model  ought  to  be  available,  and  it 
must  be  possible  to  place  a  sufficient  number  of  transducers  to  places  where  they  measure  essential  process 
variables.  Typical  problems  are  thus  that  the  most  relevant  places  in  the  process  are  frequently  not 
accessible,  and  the  most  essential  properties  of  the  product  may  only  be  measurable  off-line.  Therefore, 
estimation  of  process  or  signal  states  is  frequently  made  indirectly.  For  instance,  elaborate  methods  such  as 
multivariable  analyses  or  computation  of  cross  correlations  and  power  spectra  have  been  used;  long  series 
of  measurements  are  thereby  needed,  and  still  the  statistical  dependencies  may  only  be  obtainable  in  the 
linear  approximation. 

When  using  the  SOM,  a  physically  or  chemically  definable  process  model  is  not  necessary.  The  map  is 
computed  from  normalized  measurements  over  a  training  period  (the  map  can  also  be  made  to  adapt  to  new 
measurements  continuously).  When  the  display  is  labeled  by  calibration  measurements,  it  will  be  possible 
to  monitor  the  process  state  by  following  the  trajectory  of  the  operating  point  on  the  map.  The  SOM  display 
then  facilitates  direct  semiautomatic  control  of  the  process:  the  operating  personnel  will  easily  learn  to  spot 
the  most  effective  control  variables  in  each  situation  and  to  adjust  them  in  such  a  direction  that  the 
trajectory  will  be  guided  to  the  allowed  region  in  the  shortest  way. 

Miscellaneous  process  applications  are  reported  in  the  review  article  [9]. 

Applications  to  Material  Science 

The  following  stray  examples  have  a  bearing  on  material  science  studies:  identification  of  car  body  steel 
[10],  composite  damage  assessment  [1 1],  grading  of  beer  quality  [12],  operation  guidance  in  a  blast  lumace 
[13],  shear  velocity  estimation  [14],  flow  regime  identification  and  flow  rate  measurement  [15,  16], 
analysis  of  particle  jets  [17],  and  intrusion  detection  [18]. 

Various  Tasks  in  Telecommunications 

In  telecommunications,  one  may  monitor,  e.g.,  telephone  traffic  and  communication  networks  by  the  SOM 
on  the  basis  of  their  statistics,  but  there  also  exist  numerous  tasks  in  the  telecommunications  technology  for 
which  the  SOM  would  bring  about  viable  solutions.  One  of  them  is  the  quadrature-amplitude  modulation 
(QAM)  in  digital  communications,  especially  adaptive  demodulation,  equalization,  and  intersymbol  and 
interchannel  noise  cancellation  in  high-definition  television  (HDTV).  Another  task  for  which  SOM 
solutions  have  so  far  only  been  demonstrated  by  simulation  is  efficient  encoding  and  decoding  of  images 
by  a  pair  of  SOMs,  which  tolerate  transmission  errors  much  better  than  the  more  traditional  vector 
quantization  methods. 

Maps  of  Document  Collections  (WEBSOM) 

The  basic  SOM  carries  out  a  clustering  in  the  Euclidean  vector  space.  Surprisingly,  the  same  vector-space 
clustering  methods  sometimes  apply  even  to  entities  that  are  basically  symbolic  by  their  nature.  We  shall 
show  in  the  following  that  it  is  possible  to  perform  the  clustering  of  ffee-text,  natural-language  documents, 
if  their  contents  are  described  statistically  by  using  different  words  in  them.  Document  word  statistics  can 
be  shown  to  be  very  powerful  for  the  discrimination  between  different  documents  and  their  topic  areas. 

A  rather  old  idea  is  to  use  word  histograms  to  characterize  texts.  With  the  increasing  number  of  documents 
and  if  no  restrictions  are  set  to  their  language,  the  vocabulary  to  be  taken  into  account  can  be  huge,  e.g.,  in 
Internet  documents  its  size  may  be  over  a  million.  A  trivial  method  would  be  to  consider  histograms  of  the 
most  important  words  only,  but  this  requires  plenty  of  manual  work,  and  the  discriminatory  power  of  such 
representations  remains  low.  Nonetheless  it  will  always  be  a  good  strategy  to  omit  the  words  that  occur 
most  seldom,  and  then  it  is  possible  to  reduce  the  vocabulary  to,  say,  100000  items.  But  such  histograms 
would  still  correspond  to  real  vectors  of  dimensionality  100000,  which  is  computationally  very  heavy. 


34 


The  histograms,  however,  are  usually  very  sparsely  occupied:  in  one  document  one  may  use  only,  say,  a 
few  dozen  to  a  couple  hundreds  of  different  words,  depending  on  its  length.  Therefore  a  simple  but  still 
effective  method  to  reduce  the  dimensionality  of  the  representation  vectors,  without  essentially  decreasing 
their  discriminatory  power,  is  to  project  them  randomly  onto  a  much  lower-dimensional  Euclidean  space.  If 
the  original  histogram  is  denoted  by  vector  n,  and  R  is  a  rectangular  matrix  with  random  but  normalized 
columns,  the  vectors  x  =  Rn  can  then  have  a  dimensionality  of,  say,  a  couple  of  hundreds,  and  be  used  in 
place  of  the  n  as  input  data  vectors  to  the  SOM,  without  noticeably  reducing  the  classification  accuracy. 

Without  going  into  all  computational  details,  the  document-clustering  SOM  called  the  WEBSOM  produces 
the  visual  display  of  the  document  collection  in  the  following  steps:  1 .  Some  preprocessing  of  the  texts  is 
first  carried  out,  removing  nontextual  symbols  and  very  rare  words.  Eventually,  a  stemmer  is  used  to 
transform  all  word  forms  into  their  most  probable  stem  words.  2.  The  word  histogram  of  each  document  is 
projected  randomly  onto  a  space  of  dimensionality  300  to  500,  thereby  obtaining  a  representation  vectorx 
for  each  document.  3.  A  SOM  is  formed  using  the  x  as  input  data.  4.  The  models  m,  formed  at  the  nodes 
of  the  SOM  are  labeled  by  all  those  documents  that  are  mapped  onto  the  said  node.  In  practice,  the  nodes 
are  linked  to  the  proper  document  data  base  by  address  pointers  (indexing).  5.  Standard  browsing  software 
tools  are  used  to  read  the  documents  mapped  to  the  SOM  nodes. 

It  is  possible  to  use  a  ‘key  document, “  or  alternatively,  a  set  of  keywords  as  a  search  argument  to  find  the 
best-matching  node  of  the  SOM,  where  browsing  may  begin. 

An  example  of  HTML  pages  found  in  browsing  is  given  in  Fig.  3. 

We  have  already  implemented  WEBSOM  systems  for  the  following  applications: 

•  Internet  Usenet  newsgroups;  the  largest  experiment  consisted  of  85  newsgroups,  with  a  total  of  over  1 
million  documents.  The  size  of  the  SOM  was  thereby  104  448  nodes. 

•  News  bulletins  (in  Finnish)  of  the  Finnish  News  Agency  (Finnish  Reuter). 

•  Patent  abstracts  (English)  that  are  available  in  electronic  form.  The  largest  demonstration,  being 
finished  during  the  writing  of  this  report,  consists  of  seven  million  patent  abstracts  from  the  U.S.,  Japan 
and  European  patent  offices  and  the  SOM  array  consists  of  1  002  240  nodes. 

Demonstrations  of  various  WEBSOM  displays  are  available  on  the  Internet  at  the  address 
http://websom.hut.fi/websom/,  where  they  can  be  browsed  with  standard  www  browsers. 

Other  Major  Application  Areas  of  the  SOM 

Applications  to  medicine  have  barely  started  to  spread.  In  the  analysis  of  brain  signals ,  EEG  and  MEG,  it 
has  been  possible  to  classify  various  waking  and  sleep  states  on  the  SOM.  Various  biochemical  analyses 
and  interpretation  of  results  in  clinical  chemistry  already  use  SOM  methods  that  have  been  transferred  to 
practice.  An  important  mode  of  use  of  the  SOM  is  the  profiling  of  patients  for  their  diagnosis  and 
treatment. 

In  addition  to  the  four  main  areas  mentioned  above,  one  may  report  numerous  tasks  in  finance,  ranging 
from  the  analysis  and  prediction  of  time  series  to  the  classification  and  evaluation  of  macroeconomic 
systems.  We  have  cooperated  with  the  World  Bank,  analyzing  their  socioeconomic  data  in  many  ways  [19]. 
Analyses  of  financial  performance  [20]  and  bankrupts  [21]  of  companies  are  being  made  using  the  SOM 
method.  The  reform  of  the  Finnish  forest  taxation  in  1992,  i.e.,  an  option  given  to  the  owners  to  choose 
between  two  taxation  policies,  was  based  on  a  cluster  analysis  made  by  the  SOM  method. 

One  also  ought  to  mention  the  following  examples  from  other  fields  of  science.  The  Finnish  meteorological 
institute  has  used  the  SOM  to  classify  clouds  [22]  from  infra-red  satellite  images  (especially  at  nighttime), 
and  in  the  USA,  astronomical  data  produced  by  the  Hubble  space  telescope  [23]  have  been  analyzed;  an 
earlier  unknown  classification  of  thousands  of  galaxies  at  moderate  red  shifts  has  thereby  emerged. 


35 


QUERY :  chess  playing  neural  nets, 
NN  chess  player  vs.  human  player 


Re:  Great  Shareware  m  mail .netsr q .com @ netsrq .c 
fno  subjects  B  Virginia  Champoux,  Wed,  29  Nov  19 
Re:  Loebner  Prize  $2000  and  a  medal  K  Jim  Balte 
Modern  Jazz  Playlist  -  WLRN  FM  III  Steve  Malag 
Re:  Learning  B  Jim  Balter,  Sun,  10  Mar  1996,  Lines 
Looking  for  Neural-Net  based  Chess/Checkers  ♦  P 


WEBSOM  no 


Article  on  Kasparov  vs  Deep  Blue 


Re:  Computer  scores  historic  chess  win  over  Kasparov  H  K 


Re:  Paul  Desmond  ♦  Todd  Hildreth,  Fri,  10  Nov  1995,  Lines: . 
Re:  1st  person  imperatives  B  Max  Crittenden,  Thu,  27  Jul  199 
Re:  chess  game  theory  P  Cris  Moore,  22  Nov  1996,  Lines:  1 1 . 
Re:  Sanitary  Napkins  B  Norbert  C  Tagge,  27  Mar  1996,  Lines 

;  Bepinnel@ibm.net,  25  Jun  1995,  Lin 


WEBSOM  node 


Click  arrows 

to  move  to  neighboring  nodes  on  the  map. 
Instructions 


Re:  Article  on  Kasparov  vs  Deep  Blue  B  Robert  H 
Dexter  Gordon:  Live  at  Montmarte!!!  K  M  Ax  Evan 
You  know  8  queen  problem?  Help  me.  B  zhuhail@ 


to  move  to  neighboring  nodes  on  the  map. 
Instructions 


Re:  Funny  Names???  ■  Teemu  Peltonen,  27  Nov  1995,  Lines: 
Re:  Programming  a  computer  to  play  B  Tord  Kallqvist  Rom 
Re:  Programming  a  computer  to  play  H  PhiRatE,  17  Oct  199 
Re:  Chess.  AI  and  drosophila  B  Christer  Ericson,  Sat,  29  Jun : 
Re:  Great  Shareware  B  John  P  DeMastri,  Wed,  21  Aug  1996, 
Re:  Computer  scores  historic  chess  win  over  Kasparov  P  Y 


Fig.  3.  Content-addressable  search  from  a  1  124  134-document  WEBSOM. 


The  contents  of  two  adjacent  nodes  are  shown. 


It  is  not  possible  to  survey  the  whole  range  of  applications  of  the  SOM  method  in  more  detail  in  this  paper. 
Let  it  suffice  to  refer  to  the  list  of  3343  research  papers  on  the  SOM  [24]  that  is  also  available  at  the 
Internet  address  http://www.icsi.berkelev.edu/~iagota/NCS/vol  1  .html. 


36 


REFERENCES 

1.  T.  Kohonen,  1995.  Self-organizing  maps.  Series  in  Information  Sci.,  30.  Springer,  Heidelberg.  2nd  Ed.  1997. 

2.  M.  Cottrell,  J.  C.  Fort,  and  G.  Pages,  1997.  Theoretical  aspects  of  the  SOM  algorithm.  Proc. 

WSOM97,  Workshop  on  Self-Organizing  Maps  ,  Espoo,  Finland,  246-267. 

3.  T.  Hastie  and  W.  Stuetzle,  1989.  Principal  curves.  J.  Am.  Stat.  Assoc.,  84,  502-516. 

4.  C.  Bishop,  M.  Svensen,  and  C.  Williams,  1996.  GTM:  a  principled  alternative  to  the  self-organizing 
map.  In  Artificial  Neural  Networks  -  ICANN  96,  1996  Int.  Conf.  Proc.,  C.  v.d.  Malsburg,  W.  von 
Seelen,  J.  Vorbruggen,  and  B.  Sendhoff,  Eds.,  pp.  165-7.  Springer-Verlag,  Berlin. 

5.  T.  Kohonen,  J.  Hynninen,  J.  Kangas,  and  J.  Laaksonen,  1996.  SOM_PAK:  The  self-organizing  map 
program  package.  Helsinki  University  of  Technology,  Laboratory  of  Computer  and  Information 
Science,  Espoo,  Finland,  Report  A3 1 . 

6.  T.  Kohonen,  1999.  Fast  evolutionary  learning  with  batch-type  self-organizing  maps.  Neural  Process. 

Lett.,  in  press. 

7.  Y.  Cheng,  1997.  Convergence  and  ordering  ofKohonen’s  batch  map.  Neural  Comput.,  9,  1667-1676. 

8.  M.  Kasslin,  J.  Kangas,  and  O.  Simula,  1992.  Process  state  monitoring  using  self-organizing  maps,  in 
Artificial  Neural  Networks,  I.  Aleksander,  J.  Taylor,  Eds.,  2,  1532-1534.  North-Holland,  Amsterdam. 

9.  T.  Kohonen,  E.  Oja,  O.  Simula,  A.  Visa,  and  J.  Kangas,  1996.  Engineering  applications  of  the  self¬ 
organizing  map.  Proc.  IEEE,  84(10),  1358-1384. 

10.  W.  Kessler,  D.  Ende,  R.W.  Kessler,  and  W.  Rosenstiel,  1993.  Identification  of  car  body  steel  by  an 
optical  on  line  system  and  Kohonen’s  self-organizing  map.  Proc.  ICANN93,  Int.  Conf.  on  Artificial 
Neural  Networks,  S.  Gielen  and  B.  Kappen,  Eds.,  p.  860.  Springer,  London. 

11.  B.  Grossman,  X.  Gao,  and  M.  Thursby,  1991.  Composite  damage  assessment  employing  an  optical 
neural  network  processor  and  an  embedded  fiber  optic  sensor  array.  Proc.  SPIE-Int.  Soc.for  Opt. 

Eng.,  1588,  64-75. 

12.  Y.  Cai,  1994.  The  application  of  the  artificial  neural  network  in  the  grading  of  beer  quality.  Proc. 

WCNN94 , 1,  516-520.  Lawrence  Erlbaum,  Hillsdale,  NJ. 

13.  M.  Konishi,  Y.  Otsuka,  K.  Matsuda,  N.  Tamura,  A.  Fuki,  and  K.  Kadoguchi,  1990.  Application  of 
neural  network  to  operation  guidance  in  blast  furnace.  Third  European  Seminar  on  Neural  Computing: 

The  Marketplace,  13.  IBC  Tech.  Services,  London. 

14.  P.  Burrascano,  P.  Lucci,  G.  Martinelli,  and  R.  Perfetti,  1990.  Shear  velocity  estimation  by  the 
combined  use  of  supervised  and  unsupervised  neural  networks.  Proc.  ICASSP-90,  Int.  Conf.  on 
Acoustics,  Speech  and  Signal  Processing,  IV,  1921-1924.  IEEE,  Piscataway,  NJ. 

15.  S.  Cai,  H.  Toral,  and  J.  Qiu,  1993.  Flow  regime  identification  by  a  self-organizing  neural  network. 

Proc.  ICANN93,  Int.  Conf.  onANNs  ,  S.  Gielen,  B.  Kappen,  Eds.,  868.  Springer,  London. 

16.  S.  Cai  and  FI.  Toral,  1993.  Flowrate  measurement  in  air-water  horizontal  pipeline  by  neural  network. 

Proc.  IJCNN-93,  Int.  Joint  Conf.  on  Neural  Nets,  Nagoya,  II,  2013-2016.  IEEE,  Piscataway,  NJ. 

17.  K.H.  Becks,  J.  Dahm,  and  F.  Seidel,  1992.  Analysing  particle  jets  with  artificial  neural  networks. 

Industrial  and  Engineering  Applications  of  Artificial  Intelligence  and  Expert  Systems,  5th  Int.  Conf. 
IEA/AIE-92,  F.  Belli  and  F.J.  Radermacher,  Eds.,  109-112.  Springer,  Berlin,  Heidelberg. 

18.  K.L.  Fox,  R.R.  Henning,  J.H.  Reed,  and  R.P.  Simonian,  1990.  A  neural  network  approach  towards 
intrusion  detection.  Proc.  13  th  National  Computer  Security  Conference.  Information  Systems  Security. 
Standards  -  The  Key  to  the  Future,  I,  124- 1 34.  NIST,  Gaithesburg,  MD. 

1 9.  G.  Deboeck  and  T.  Kohonen,  Eds.  1998.  Visual  Exploration  in  Finance  with  Self-Organizing  Maps. 
Springer-Verlag,  London. 

20.  B.  Back,  K.  Sere,  and  H.  Vanharanta,  1997.  Analyzing  financial  performance  with  self-organizing 
maps.  Proc.  WSOM97,  Workshop  on  Self-Organizing  Maps  ,  Espoo,  Finland,  356-361. 

21.  K.  Kiviluoto,  1998.  Predicting  bankruptcies  with  the  self-organizing  map .Neurocomput.,  21,  191-201. 

22.  A.  Visa,  K.  Valkealahti,  J.  livarinen,  and  O.  Simula,  1991.  Experiences  from  operational  cloud 
classifier  based  on  self-organizing  map.  In  Proc.  SPIE  -  The  Int.  Society  for  Optical  Engineering, 
Applications  of  Artificial  Neural  Networks  V,  2243, 484-495. 

23.  A.  Naim,  K.  U.  Ratnatunga,  and  R.  E.  Griffiths,  1997.  Galaxy  morphology  without  classification:  Self¬ 
organizing  maps.  Astrophys.  J.  Suppl.  Series,  111,  357-367. 

24.  S.  Kaski,  J.  Kangas,  and  T.  Kohonen,  1998.  Bibliography  of  self-organizing  map  (SOM)  papers:  1981- 
1991.  Neural  Computing  Surveys,  1(3&4),  1-176.  http://www.icsi.berkelev.eduHagota/NCS/voll  .html 


37 


Rough  Set  Theory  for  Intelligent  Industrial  Applications 

Zdzisiaw  Pawtak 

Institute  of  Theoretical  and  Applied  Informatics, 

Polish  Academy  of  Sciences,  Poland 
Email:  zpw@ii.pw.edu.pl 


ABSTRACT 

Application  of  intelligent  methods  in  industry  has  become  a  very  challenging  issue  nowadays  and  will  be  of 
extreme  importance  in  the  future.  Intelligent  methods  include  fuzzy  sets,  neural  networks,  genetic  algorithms 
and  other  techniques  known  as  soft  computing.  No  doubt,  rough  set  theory  can  also  contribute  to  this  domain.  In 
this  paper  basic  ideas  of  rough  set  theory  are  presented  and  some  possible  intelligent  industrial  applications 
outlined. 


INTRODUCTION 

Rough  set  theory  is  a  new  mathematical  approach  to  data  analysis.  The  basic  idea  of  this  method  hinges  on 
classifying  objects  of  interest  into  similarity  classes  (clusters)  containing  objects  which  are  indiscernible  with 
respect  to  some  features,  e.g.,  color,  temperature,  etc.  which  form  basic  building  blocks  of  knowledge  about 
reality  and  are  employed  to  find  hidden  patterns  in  data.  The  basis  of  rough  set  theory  is  found  in  [20,22,26]. 

Rough  set  theory  has  some  overlap  with  other  methods  of  data  analysis,  e.g.,  statistics,  cluster  analysis,  fuzzy 
sets,  evidence  theory  and  others,  but  it  can  be  viewed  in  its  own  right  as  an  independent  discipline.  The  rough 
set  approach  seems  to  be  fundamental  to  AI  and  cognitive  sciences,  especially  in  the  areas  of  machine  learning, 
knowledge  acquisition,  decision  analysis,  knowledge  discovery  from  databases,  expert  systems,  inductive 
reasoning  and  pattern  recognition.  It  seems  particularly  important  in  decision  support  systems  and  data  mining. 

Rough  set  theory  has  been  successfully  applied  to  solve  many  real-life  problems  in  medicine,  pharmacology, 
engineering,  banking,  financial  and  market  analysis  and  others.  More  about  applications  of  rough  set  theory  can 
be  found  in  [9,17,19,26,28,35,40]  and  others.  Very  promising  areas  of  new  applications  for  rough  sets  are 
expected  to  emerge  in  the  near  future.  These  include  rough  control,  rough  data  bases,  rough  information 
retrieval,  rough  neural  networks  and  others. 


ROUGH  SETS  AND  INTELLIGENT  INDUSTRIAL  APPLICATIONS 

Artificial  intelligence  approach  to  industrial  processes  is  real  challenge  for  industry  in  the  years  to  come.  Rough 

set  theory  seems  to  be  particularly  suited  for  problem  solving  in  this  area.  Some  examples  are: 

1)  Material  sciences.  Application  of  rough  sets  to  new  materials  design  and  investigating  material  properties 
has  already  shown  the  usefulness  of  the  theory  in  this  area.  Pioneer  work  in  this  domain  is  due  to  Jackson  et 
al  [4,5,6].  It  is  also  interesting  to  note  the  work  on  applying  rough  sets  to  investigate  the  relationship 
between  structure  and  activity  of  drugs  [28].  The  method  used  can  be  also  applied  to  any  other  type  of 
materials. 

2)  Intelligent  control.  Industrial  process  control  in  many  cases,  especially  highly  non-linear  systems,  cannot  be 
treated  successfully  with  classical  control  theory  methods.  It  turns  out  that  in  this  case  fuzzy  sets,  neural 
networks,  genetic  algorithms  offer  very  good  solutions.  Also  rough  sets  can  be  used  here  in  many  cases. 
Cement  kiln  control  algorithms  obtained  from  observation  of  stoker  actions  and  blast  furnace  control  in  iron 
and  steel  works  are  exemplary  applications  of  rough  set  techniques  in  intelligent  industrial  control  [11]. 
Satellite  attitude  control  [25]  is  another  non-trivial  example  of  the  application  of  rough  set  theory  in 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


38 


intelligent  control.  More  on  applications  of  this  theory  in  control  can  be  found  in  [3,8,12,13,17,23,33,34,39, 
41,42,43].  The  rough  set  approach  in  control  offers  simple  and  fast  algorithms,  which  can  be  obtained  either 
from  observation  of  the  controlled  process  or  mathematical  model  of  the  process  or  a  knowledgeable  expert. 

3)  Decision  support  systems.  Rough  set  based  decision  support  systems  can  be  used  for  many  kinds  of 
industrial  decision-making  on  various  levels,  from  specific  industrial  processes  to  management  and  business 
decisions  [28,29,30], 

4)  Machine  diagnosis.  The  rough  set  approach  has  been  used  to  provide  technical  diagnosis  of  mechanical 
objects  by  analysing  vibroacoustics  symptoms  [14,15,16,31,32], 

5)  Neural  networks.  Neural  networks  have  found  many  interesting  applications  in  intelligent  control  of 
industrial  processes.  Combining  neural  networks  with  fuzzy  sets  adds  new  dimension  to  this  domain.  Rough 
sets  and  neural  networks  can  be  also  linked  together  and  give  better  results  and  greater  speed  than  the 
classical  neural  network  approach  alone.  Besides,  an  interesting  idea  of  i ough  neural  network  has  been 
proposed  in  [10].  More  about  rough  sets  and  neural  networks  can  be  found  in  references  given  in  [26]. 

6)  Varia.  Besides  the  above  domains  of  intelligent  industrial  applications  of  rough  sets  there  are  many  other 
fields  where  rough  set  approach  can  be  useful  [1 ,2,7,1 9,26,35,37,38,43]. 

The  above  discussed  list  of  possible  applications  of  rough  sets  is  of  course  not  exhaustive  but  shows  areas  where 
application  of  rough  set  has  already  proved  to  be  of  use. 

The  rough  set  approach  has  many  advantages.  The  most  important  ones  are  listed  below. 

•  Provides  efficient  algorithms  for  finding  hidden  patterns  in  data. 

•  Identifies  relationships  that  would  not  be  found  while  using  statistical  methods. 

•  Allows  both  qualitative  and  quantitative  data. 

•  Finds  minimal  sets  of  data  (data  reduction). 

•  Evaluates  significance  of  data. 

•  Generates  sets  of  decision  rules  from  data. 

•  It  is  easy  to  understand. 

•  Offers  straightforward  interpretation  of  obtained  results. 

No  doubt  rough  set  theory  can  be  used  in  many  branches  of  intelligent  industrial  applications  as  an  independent, 
complementary  approach  or  combined  with  other  areas  of  soft  computing,  e.g.  fuzzy  sets,  neural  networks,  etc. 


APPROXIMATIONS  -  BASIC  CONCEPTS  OF  ROUGH  SET  THEORY 

Data  are  usually  given  in  a  form  of  a  data  table,  called  also  attribute-value  table,  information  table  or  database. 
A  database  is  a  table,  rows  of  which  are  labeled  by  objects ,  whereas  columns  are  labeled  by  attributes.  Entries 
of  the  table  are  attribute  values.  An  example  of  a  database  is  shown  in  Table  1 . 


Table  1.  An  attribute-value  table  of  data. 


Store 

E 

Q 

L 

p 

1 

high 

good 

no 

profit 

2 

med. 

good 

no 

loss 

3 

med. 

good 

no 

profit 

4 

no 

avg. 

no 

loss 

5 

med. 

avg. 

yes 

loss 

6 

high 

avg. 

yes 

profit 

In  the  database  six  stores  are  characterized  by  four  attributes: 

E  -  empowerment  of  sales  personnel, 

Q—  perceived  quality  of  merchandises, 

L  -  high  traffic  location, 

P  -  store  profit  or  loss. 


39 


Each  subset  of  attributes  in  the  database  determines  a  partition  of  all  objects  into  clusters  having  the  same 
attribute  values,  i.e.,  displaying  the  same  features  expressed  in  terms  of  attribute  values.  In  other  words,  all 
objects  revealing  the  same  features  are  indiscernible  ( similar )  in  view  of  the  available  information  and  they 
form  blocks,  which  can  be  understood  as  elementary  granules  of  knowledge.  These  granules  are  called 
elementary  sets  or  concepts,  and  can  be  considered  as  elementary  building  blocks  (atoms)  of  our  knowledge 
about  the  reality  we  are  interested  in.  Elementary  concepts  can  be  combined  into  compound  concepts,  i.e., 
concepts  that  are  uniquely  defined  in  terms  of  elementary  concepts.  Any  union  of  elementary  sets  is  called  a 
crisp  set,  and  any  other  set  is  referred  to  as  rough  (vague,  imprecise). 

With  every  set  X  we  can  associate  two  crisp  sets,  called  the  lower  and  the  upper  approximation  ofX  The  lower 
approximation  of  A' is  the  union  of  all  elementary  sets  which  are  included  inX,  whereas  the  upper  approximation 
of  X  is  the  union  of  all  elementary  sets  which  have  non-empty  intersection  with  X.  In  other  words,  the  lower 
approximation  of  a  set  is  the  set  of  all  elements  that  surely  belong  to  X,  whereas  the  upper  approximation  ofX  is 
the  set  of  all  elements  that  possibly  belong  to  X.  The  difference  between  the  upper  and  the  lower  approximations 
ofXis  its  boundary  region.  Obviously  a  set  is  rough  if  it  has  non-empty  boundary  regions;  otherwise  the  set  is 
crisp.  Elements  of  the  boundary  region  cannot  be  classified  employing  the  available  knowledge,  either  to  the  set 
or  its  complement.  Approximations  of  sets  are  basic  operations  in  rough  set  theory  and  are  used  as  main  tools  to 
deal  with  vague  and  uncertain  data.  Let  us  illustrate  the  above  ideas  by  means  of  data  given  in  Table  1. 

Each  store  has  a  different  description  in  terms  of  attributes  E,  Q,  L  and  P,  thus  all  stores  are  discernible  when 
employing  information  provided  by  all  attributes.  However,  stores  2  and  3  are  indiscernible  in  terms  of 
attributes  E,  Q  and  L,  since  they  have  the  same  attribute  values.  Similarly,  stores  1,  2  and  3  are  indiscernible 
with  respect  to  attributes  Q  and  L,  etc. 

Each  subset  of  attributes  determines  a  partition  (classification)  of  all  objects  into  classes  having  the  same 
description  in  terms  of  these  attributes.  For  example,  attributes  Q  and  L  aggregate  all  stores  into  the  following 
classes  {1,  2,  3},  {4},  {5,  6}.  Thus,  each  database  determines  a  family  of  classification  patterns  which  are  used 
for  further  considerations. 

Let  us  consider  the  following  problem:  what  are  the  characteristic  features  of  stores  making  a  profit  (or  having  a 
loss)  in  view  of  information  available  in  Table  1,  i.e.,  we  want  to  describe  a  set  (concept)  {1,  3,  6}  (or  {2,  4,  5}) 
in  terms  of  attributes  E,  Q  and  L.  Of  course  this  question  cannot  be  answered  uniquely  since  stores  2  and  3  have 
the  same  values  of  attributes  E,  Q  and  L,  but  store  2  makes  a  profit,  whereas  store  3  has  a  loss.  Hence  in  view 
of  information  contained  in  Table  1,  we  can  say  for  sure  that  stores  1  and  6  make  a  profit,  stores  4  and  5  have  a 
loss,  whereas  stores  2  and  3  cannot  be  classified  as  making  a  profit  or  having  a  loss.  That  is,  employing 
attributes  E,  Q  and  L,  we  can  say  that  stores  1  and  6  surely  make  a  profit,  i.e.,  surely  belong  to  the  set  { 1 ,  3,  6), 
whereas  stores  1,  2,  3  and  6  possibly  make  a  profit,  i.e.,  possibly  belong  to  the  set{l,  3,  6}.  We  say  that  the  set 
{1,  6}  is  the  lower  approximation  of  the  set  (concept)  {1,3,  6},  and  the  set  {1,  2,  3,  6}  is  the  upper 
approximation  of  the  set  {1,  3,  6}.  The  set  {2,  3},  being  the  difference  between  the  upper  approximation  and 
the  lower  approximation  is  referred  to  as  the  boundary  region  of  the  set  { 1 ,  3,  6 } . 

Now  let  us  give  some  formal  notations  and  definitions. 

By  a  database,  we  understand  a  pair  S  =  (U,  A),  where  U  and  A,  are  finite,  non-empty  sets  called  the  universe, 
and  a  set  of  attributes  respectively.  With  every  attribute  a  e  A  ,  we  associate  a  set  Va  of  its  values,  called  the 
domain  of  a.  Any  subset  B  of  A  determines  a  binary  relation  1(B)  on  U,  which  is  called  an  indiscernibility 
relation,  defined  as  follows: 

(x,  y)  e  1(B)  if  and  only  if  a(x)  =  a(y) 
for  every  ae  A,  where  a(x)  denotes  the  value  of  an  attribute  a  of  element  x. 

It  can  easily  be  seen  that  1(B)  is  an  equivalence  relation.  The  family  of  all  equivalence  classes  of  7(5),  i.e.,  the 
partition  determined  by  B,  is  denoted  by  U/I(B),  or  simple  UIB;  an  equivalence  class  of  1(B),  i.e.,  the  block  of 
the  partition  UIB  containing  x  is  denoted  by  B(x). 


40 


If  ( x ,  y )  belongs  to  1(B)  we  say  that  x  and  y  are  B-indiscernible.  Equivalence  classes  of  the  relation  1(B)  (or 
blocks  of  the  partition  U/B)  are  referred  to  as  B-elementary  sets  or  B-granules. 

Next,  the  indiscemibility  relation  is  used  to  define  two  basic  operations  in  rough  set  theory  as  follows: 

xeU  1 

5*(jf)  =  UW*)  :s(x)nJf  9t0}, 

and  are  called  the  B-lower  and  the  B-upper  approximation  of X,  respectively. 

The  set 

is  referred  to  as  the  B-boundary  region  of  X.  If  the  boundary  region  of  X  is  the  empty  set,  i.e.,  BNb  (X)  =  0 , 

then  the  set  X  is  crisp  {exact)  with  respect  to  B;  in  the  opposite  case,  i.e.,  if  BNb  (X)  *  0 ,  the  set  X  is  referred 
to  as  rough  {inexact)  with  respect  to  B. 


DEPENDENCY  ATTRIBUTES 

Approximations  of  sets  are  strictly  related  with  the  concept  of  dependency  (total  or  partial)  of  attributes. 

Suppose  the  set  of  attributes  A  is  partitioned  into  two  disjoint  subsets  C  and  D  called  condition  and  decision 
attributes,  respectively.  Databases  with  distinguished  condition  and  decision  attributes  are  referred  to  as 
decision  tables. 

Intuitively,  a  set  of  decision  attributes  D  depends  totally  on  a  set  of  condition  attributes  C,  denoted  C  =>  D,  if 
all  values  of  decision  attributes  are  uniquely  determined  by  values  of  condition  attributes.  In  other  words,!) 
depends  totally  on  C,  if  there  exists  a  functional  dependency  between  values  of  C  and  D. 

We  also  need  a  more  general  concept  of  dependency  of  attributes,  called  the  partial  dependency  of  attributes. 
Partial  dependency  means  that  only  some  values  of  D  are  determined  by  values  of  C.  Formally  dependency  can 
be  defined  in  the  following  way: 

Let  C  and  D  be  subsets  of  A,  such  that  DnC/0  and  Dkj  C  =  A. 

We  say  that  D  depends  on  C  in  a  degree  k  |o  <  k  <  ij,  denoted  C  => kD ,  if 

cardiC  A  X)) 

k  =  y(C,D)  =  2  - 

sun  d  cardyV) 


where  card  (X)  is  the  cardinality  ofX . 

If  k  =  1  we  say  that  D  depends  totally  on  C,  and  if  k  <  1 ,  we  say  that  D  depends  partially  {in  a  degree  k)  on  C. 

The  coefficient  k  expresses  the  ratio  of  all  elements  of  the  universe,  which  can  be  properly  classified  to  the 
block  of  the  partition  U/D  employing  attributes  C  and  is  called  the  degree  of  the  dependency. 

For  example,  the  attribute  P  depends  on  the  set  of  attributes  {E.  Q,  L)  in  the  degree  2/3.  That  means  that  only 
four  of  six  stores  can  be  identified  exactly  by  means  of  attributes!,  Q  and  L  as  having  a  loss  or  making  a  profit. 


41 


REDUCTION  OF  ATTRIBUTES 

We  often  face  the  question  of  whether  we  can  remove  some  data  from  a  database  preserving  its  basic  properties, 
i.e.,  whether  a  table  contains  some  superfluous  data. 

Let  us  express  this  idea  more  precisely: 

Let  C,D  <Z  A  ,  be  sets  of  condition  and  decision  attributes,  respectively.  We  say  that  C'  C  C  is  a  D-reduct 
(reduct  with  respect  to  D)  of  C,  if  C’  is  a  minimal  subset  of  C  such  that 

y(c,d)=y(c',Z)). 

Hence  any  reduct  enables  us  to  reduce  condition  attributes  in  such  a  way  that  the  degree  of  dependency 
between  condition  and  decision  attributes  is  preserved.  In  other  words  reduction  of  condition  attributes  gives  the 
minimal  number  of  conditions  necessary  to  make  specified  decisions. 

In  the  database  presented  in  Table  1  { E ,  Q}  and  {E,  L }  are  the  only  two  reducts  of  condition  attributes  with 
respect  to  P,  i.e.,  either  the  set  {E,  Q }  or  the  set  {E,  L }  can  be  used  to  classify  stores  instead  of  the  whole  set  of 
condition  attributes  {E,  Q,  L}. 

For  large  databases  finding  reducts  on  the  basis  of  the  definition  given  above  is  rather  difficult  because  the 
definition  leads  to  inefficient  algorithms.  Therefore  more  efficient  methods  of  reduct  computation  have  been 
proposed.  For  details  see  references  in  [26]. 


DECISION  RULES 

Every  dependency  C=>k  D  can  be  described  by  a  set  of  decision  rules  in  the  form  "If ...  then ”,  written 

<f>  where  O  and'F  are  logical  formulas  describing  conditions  and  decisions  of  the  rule  respectively,  and 
are  built  up  from  elementary  formulas  (attribute,  value )  combined  together  by  means  of  propositional 
connectives  "and\  "or"  and  "not”  in  the  standard  way. 


An  example  of  a  decision  rule  is  given  below: 

if(E,med.)  and  (Q,  good)  and  ( L,no )  then  ( P,loss ). 

With  every  decision  rule  <t>  — ,  we  associate  a  conditional  probability  that  XF  is  true  in  S,  given  O  is  true  in  5 
with  the  probability  n  s  (<t> )  called  a  certainty  factor  and  defined  as  follows: 


where  |<f>  |s  denotes  the  set  of  all  objects  satisfying  O  in  S. 


As  well,  we  need  a  coverage  factor  [36] 


card(&  |5) 
card{^V  |s) 


which  is  the  conditional  probability  that<J>  is  true  in  S,  given  ft*  is  true  in  S  with  the  probability  nsC¥ ) . 

For  the  decision  rule  given  above  the  certainty  and  coverage  factors  are  1/2  and  1/3,  respectively,  i.e.,  the 
probability  that  the  decision  made  by  the  decision  rule  is  correct  equals  1/2  and  the  rule  covers  one  of  the  three 
decisions  indicated  by  the  rule. 


42 


Let  {<E>,  -4*L  }„  be  a  set  of  decision  rules  such  that  all  conditions  <E>,  are  pairwise  mutually  exclusive,  i.e., 
fl>,  aO.  |5  =  0 ,  for  any  1  <  ij  <n,i  *  j,  and 

jSts(o,.|'F)  =  l. 

;=1 

For  any  decision  rule  O  — the  following  is  true: 

ns(y  |o)-Jts(o 

/=! 

The  relationship  between  the  certainty  factor  and  the  coverage  factor,  expressed  by  this  formula  is  the  Bayes’ 
theorem.  The  formula  shows  that  any  decision  table  satisfies  Bayes’  theorem.  This  property  gives  a  new 
dimension  to  Baysian  reasoning  methods  and  enables  us  to  discover  relationships  in  data  without  referring  to 
prior  and  posterior  probabilities  inherently  associated  with  Bayesian  philosophy.  The  above  result  is  of  special 
value  for  large  databases. 


) 


CONCLUSION 

Rough  set  theory  proves  to  be  a  very  well  suited  candidate,  besides  fuzzy  sets,  neural  networks  and  other  soft 
computing  methods,  for  intelligent  industrial  applications.  Particularly  challenging  areas  of  applications  of 
rough  sets  in  industrial  environment  are  material  science,  intelligent  control,  machine  diagnosis  and  decision 
support. 

The  rough  set  approach  has  many  advantages,  e.g.,  it  identifies  relationships  that  would  not  be  found  using 
statistical  methods,  allows  both  qualitative  and  quantitative  data  and  offers  straightforward  interpretation  of 
obtained  results.  Despite  many  successful  applications  of  rough  sets  in  industry,  there  are  still  problems  which 
require  further  research.  In  particular,  development  of  suitable,  widely-accessible  software  dedicated  to 
industrial  applications  as  well  as  microprocessors  based  on  rough  set  theory  are  badly  needed. 

REFERENCES 

1.  A.  An,  C.  Chan,  N.  Shan,  N.  Cercone,  W.  Ziarko,  1997.  Applying  knowledge  discovery  to  predict  water- 
supply  consumption.  IEEE  Expert,  12(4),  72-78. 

2.  T.  Arciszewski,  W.  Ziarko,  1990.  Inductive  learning  in  civil  engineering:  rough  sets  approach. 
Microcomputers  and  Civil  Engineering,  5(1). 

3.  E.  Czogala,  A.  Mrozek,  Z.  Pawlak,  1995.  The  idea  of  rough-fuzzy  controller.  International  Journal  of 
Fuzzy  Sets  and  Systems,  72,  61-63. 

4.  A.G.  Jackson,  M.  Ohmer,  H.  Al-Kamhawi,  1994.  Rough  sets  analysis  of  chalcopyrite  semiconductor  band 
gap  data.  In:  T.  Y.  Lin  (ed.),  The  Third  International  Workshop  on  Rough  Sets  and  Soft  Computing 
Proceedings  (RSSC'94),  November  10-12,  San  Jose  State  University,  San  Jose,  California,  USA,  408-417. 

5.  A.G.  Jackson,  S.R.  Leclair,  M.C.  Ohmer,  W.  Ziarko,  H.  Al-Kamhwi,  1996.  Rough  sets  applied  to  material 
data.  Acta  Metallurgica  et  Materialia,  4475 . 

6.  A.G.  Jackson,  Z.  Pawlak,  S.R.  Leclair,  Rough  set  and  discovery  of  new  materials.  J.  of  Alloys  and  Comp, 
(in  press). 

7.  W.  Kowalczyk,  1996.  Analyzing  temporal  patterns  with  rough  sets.  In:  EUFIT-96:  The  fourth  European 
Congress  on  Intelligent  Techniques  and  Soft  Computing,  September  2-5,  Aachen,  139-143. 

8.  T.Y.  Lin,  1997.  Fuzzy  controllers:  an  integrated  approach  based  on  fuzzy  logic,  rough  sets,  and 
evolutionary  computing,  in:  T.  Y.  Lin  and  N.  Cercone  (eds.),  Rough  Sets  and  Data  Mining.  Analysis  for 
Imprecise  Data,  Kluwer  Academic  Publishers,  Boston,  London,  Dordrecht,  123-138. 

9.  T.Y.  Lin,  N.  Cercone,  (eds.),  1997.  Rough  Sets  and  Data  Mining  -  Analysis  of  Imperfect  Data,  Kluwer 
Academic  Publishers,  Boston,  London,  Dordrecht,  430. 


43 


10.  P.  Lingras,  1996.  Rough  neural  networks.  Sixth  International  Conferences,  Information Procesing  and 
Management  of  Uncertainty  in  Knowledge-Based  Systems,  Proceedings  (IPMU'96),  Volume  II,  July  1-5, 
Grenada,  1445-1450. 

11.  A.  Mrozek,  1992.  Rough  sets  in  computer  implementation  of  rule-based  control  of  industrial  processes.  In: 
R.  Slowinski  (ed.),  Intelligent  Decision  Support.  Handbook  of  Applications  and  Advances  of  the  Rough 
Set  Theory.  Kluwer  Academic  Publishers,  Boston,  London,  Dordrecht,  1 9-3 1 . 

12.  T.  Munakata,  1997.  Rough  control:  a  perspective.  In:  T.  Y.  Lin  and  N.  Cercone  (eds.),  Rough  Sets  and 
Data  Mining.  Analysis  for  Imprecise  Data.  Kluwer  Academic  Pub.,  Boston,  London,  Dordrecht,  77-88. 

13.  T.  Munakata,  1998.  Fundamentals  of  the  New  Artificial  Intelligence.  Springer,  231. 

14.  R.  Nowicki  R.  Slowinski,  J.  Stefanowski,  1990.  Possibilities  of  appling  the  rough  sets  theory  to  technical 
diagnostics.  In:  Proceedings  of  the  IXth  National  Symposium  on  Vibration  Techniques  and  Vibroacoustics, 
December  12-14,  AGH  University  Press,  Krakow,  149-152. 

15.  R.  Nowicki,  R.  Slowinski,  J.  Stefanowski,  1992.  Rough  sets  analysis  of  diagnostic  capacity  of 
vibroacoustic  symptoms.  Journal  of  Computers  and  Mathematics  with  Applications,  24(2),  109-123. 

16.  R.  Nowicki,  R.  Slowinski,  J.  Stefanowski,  1992.  Evaluation  of  vibroacoustic  diagnostic  symptoms  by 
means  of  the  rough  sets  theory.  Journal  of  Computers  in  Industry,  20,  141-152. 

17.  A.  Oehm,  1993.  Rough  logic  control.  In:  (Project),  Technical  Report.  Knowledge  Systems  Group,  The 
Norwegian  University  of  Science  and  Technology,  Trondheim,  Norway. 

18.  E.  Orlowska  (ed.),  1997.  Incomplete  Information:  Rough  Set  Analysis.  Physica-Verlag,  Heidelberg. 

19.  S.K.  Pal,  A.  Skowron  (eds.):  Fuzzy  Sets,  Rough  Sets  and  Decision  Making  Processes.Springer-Verlag, 
Singapore  (in  preparation) 

20.  Z.  Pawlak,  1991 .  Rough  Sets  -  Theoretical  Aspects  of  Reasoning  about  Data.  Kluwer  Academic 
Publishers,  Boston,  London,  Dordrecht,  229. 

21.  Z.  Pawlak,  1998.  Rough  set  theory  and  its  applications  to  data  analysis.  Cybernetics  &Syst.,  29,  661-688. 

22.  Z.  Pawlak,  J,  Grzymala-Busse,  R.  Slowinski,  W.  Ziarko,  1995.  Rough  sets.  Commun.  of  ACM,  38,  88-95. 

23.  Z.  Pawlak,  T.  Munakata,  1996.  Rough  control  application  of  rough  set  theory  to  control.  Fourth  European 
Congress  on  Intelligent  Techniques  and  Soft  Computing,  Proceedings  EUF1T'96, 1,  209-218. 

24.  Z.  Pawlak,  1998.  Reasoning  about  data  -  a  rough  set  perspective.  In:  L.  Polkowski,  A.  Skowron  (eds.), 
Rough  Sets  and  Current  Trends  in  Computing,  Lecture  Notes  in  Artificial  Intelligence,  1424  Springer,  First 
International  Conference,  RSCTC'  98,  Warsaw,  Poland,  June,  Proceedings,  25-34. 

25.  J.F.  Peters,  K.  Ziaei,  S.  Ramanna,  1998.  Approximate  time  rough  control:  Concepts  and  application  to 
satellite  attitude  control.  In:  L.  Polkowski,  A.  Skowron  (eds.),  Rough  Sets  and  Current  Trends  in 
Computing,  Lecture  Notes  in  Artificial  Intelligence,  1424  Springer,  First  International  Conference, 

RSCTC'  98,  Warsaw,  Poland,  June,  Proceedings,  491-498. 

26.  L.  Polkowski,  A.  Skowron  (eds.),  1998.  Rough  Sets  in  Knowledge  Discovery,  Physica-Verlag,  1(2). 

27.  L.  Polkowski,  A.  Skowron  (eds.),  1998.  Rough  Sets  and  Current  Trends  in  Computing,  Lecture  Notes  in 
Artificial  Intelligence,  1424  Springer,  Proc..lst  International  Conference,  RSCTC'98,  Warsaw. 

28.  R.  Slowinski,  1992.  Intelligent  Decision  Support.  Handbook  of  Applications  and  Advances  of  the  Rough 
Set  Theory.  Kluwer  Academic  Publishers,  Boston,  London,  Dordrecht. 

29.  R.  Slowinski,  1995.  Rough  set  approach  to  decision  analysis.  AI  Expert,  10,  18-25. 

30.  R.  Slowinski,  1995.  Rough  set  theory  and  its  applications  to  decision  aid.  Belgian  Journal  of  Operation 
Research,  Special  Issue  Francoro,  35(3-4),  81-90. 

31.  R.  Slowinski,  J.  Stefanowski,  R.  Susmaga,  1996.  Rough  set  analysis  of  attribute  dependencies  in  technical 
diagnostics.  In:  S.  Tsumoto,  S.  Kobayashi,  T.  Yokomori,  H.  Tanaka  and  A.  Nakamura  (eds.),  The  fourth 
International  Workshop  on  Rough  Sets,  Fuzzy  Sets,  and  Machnine  Discovery,  Proceedings  (RS96FD), 
November  6-8,  The  University  of  Tokyo,  284-291. 

32.  J.  Stefanowski,  R.  Slowinski,  R.  Nowicki,  1992.  The  rough  sets  approach  to  knowledge  analysis  for 
classification  support  in  technical  diagnostics  of  mechanical  objects.  In:  F.  Belli  and  F.  J.  Radermacher 
(eds.),  Industrial  &  Engineering  Applications  of  Artificial  Intelligence  and  Expert  Systems.  Lecture  Notes 
in  Economics  and  Mathematical  Systems  604,  Springer- Verlag,  Berlin,  324-334. 

33.  A.  Szladow,  W.  Ziarko,  1993.  Adaptive  process  control  using  rough  sets.  Proceedings  of  the  International 
Conference  of  Instrument  Society  of  America,  ISA/93,  Chicago,  1421-1430. 

34.  A.  Szladow,  W.  Ziarko,  1993.  Application  of  rough  sets  theory  to  process  control.  Proceedings  of  Calgary 
93  Symposium  of  Instrument  Society  of  America,  Calgaiy . 


44 


35.  S.  Tsumoto,  S.  Kobayashi,  T.  Yokomori,  H.  Tanaka,  A.  Nakamura,  (eds.),  1996.  The  Fourth  Internal 
Workshop  on  Rough  Sets,  Fuzzy  Sets  and  Machine  Discovery,  Proceedings.  The  University  of  Tokyo. 

36.  S.  Tsumoto,  1998.  Modelling  medical  diagnostic  rules  based  on  rough  sets.  In:  L.Polkowski,  A.  Skowron 
(eds.),  Rough  Sets  and  Current  Trends  in  Computing,  Lecture  Notes  in  Artificial  Intelligence,  1424 
Springer,  First  International  Conference,  RSCTC’  98,  Warsaw,  Poland,  June,  Proceedings,  475-482. 

37.  P.P.  Wang,  (ed.),  1995.  Second  Annual  Joint  Conference  on  Information  Sciences,  Proceedings. 
Wrightsville  Beach,  North  Carolina,  USA. 

38.  P.  Wang,  (ed.),  1997.  Joint  Conference  of  Information  Sciences,  Vol.  3.  Rough  Sets  and  Computer 
Sciences,  Duke  University,  Ga.,  USA. 

39.  W.  Ziarko,  1992.  Acquisition  of  control  algorithms  from  operation  data.  In:  R.  Slowinski  (ed.),  Intelligent 
Decision  Support,  Handbook  of  Applications  and  Advances  of  the  Rough  Set  Theory,  Kluwer  Academic 
Publishers,  Boston,  London,  Dordrecht,  61-75. 

40.  W.  Ziarko,  (ed.),  1993.  Rough  Sets,  Fuzzy  Sets  and  Knowledge  Discovery.  Proceedings  of  the 
International  Workshop  on  Rough  Sets  and  Knowledge  Discovery  (RSKD'93),  Banff,  Alberta,  Canada, 
October  12-15,  Springer-Verlag,  Berlin. 

41.  W.  Ziarko,  J.  Katzberg,  1989.  Control  algorithms  acquisition,  analysis  and  reduction:  machine  learning 
approach,  in:  Knowledge-Based  Systems  Diagnosis,  Supervision  &  Control,  Plenum  Press,  Oxford,  167- 
78. 

42.  W.  Ziarko,  J.  Katzberg,  1993.  Rough  sets  approach  to  system  modelling  and  control  algorithm  acquisition. 
Proceedings  of  IEEE  WESCANEX  93  Conference,  Saskatoon,  154-163. 

43.  J.  Zak,  J.  Stefanowski,  1994.  Determining  maintenance  activities  of  motor  vehicles  using  rough  sets 
aproach.  in:  Proceedings  of  Euromaintenance'94  Conference,  Amsterdam,  39-42. 


45 


From  Fuzzy  Set  Theory  to  Computational  Intelligence  - 
Special  European  Experiences 

H.-J.  Zimmermann 

Aachen  University  of  Technology,  Institute  of  Operations  Research,  Aachen,  Germany 


ABSTRACT 

Even  though  the  first  publication  in  the  area  of  Fuzzy  Set  Theory  (FST)  appeared  already  in  1965,  the 
development  of  this  theory  for  almost  20  years  remained  in  the  academic  realm.  Almost  all  basic  concepts, 
theories  and  methods  were,  however,  developed  during  this  period. 

Fuzzy  Control  opened  the  gate  to  real  applications  for  FST.  Particularly  in  Japan  the  applications  of  the 
fuzzy  control  principle  in  consumer  goods  made  FST  known  in  the  public  and  made  it  commercially 
interesting  for  industry.  This  lead  to  two  developments:  since  the  development  of  fuzzy  applicational 
systems  had  to  be  efficient,  fuzzy  CASE  tools  and  expert  system  shells  were  developed  making  FST  to 
Fuzzy  Technology.  The  success  in  Japan  could  draw  the  attention  of  the  media  and  started  -  first  in 
Germany  -  the  "Fuzzy  Booms",  which  lead  to  an  unprecedented  growth  in  publications,  university  teaching 
and  other  industrial  applications  in  many  countries. 

Around  1993  FST,  Neural  Nets  and  Evolutionary  Computing  joint  forces  and  were  soon  considered  to  be 
one  area  called  Soft  Computing  or  Computational  Intelligence.  Applications  in  Engineering  as  well  as  in 
Management  will  be  described  during  the  presentation.  Of  particular  interest  for  Europe  might  also  be  the 
development  of  ERUDIT  (European  Network  of  Excellence  for  Fuzzy  Sets  and  Uncertainty  Modeling),  a 
network  which  grew  from  15  nodes  in  1995  to  250  nodes  in  1997  and  which  has  just  been  extended  for 
another  two  years  by  the  European  Commission.  Details  about  possibilities  in  the  framework  of  ERUDIT 
will  also  be  described  in  more  detail. 


HISTORICAL  DEVELOPMENT 

Fuzzy  Set  Theory,  Fuzzy  Technology  and  Computational  Intelligence 

Fuzzy  Set  Theory  was  conceived  in  1965  as  a  formal  theory  which  could  be  considered  as  a  generalization 
of  either  classical  set  theory  or  of  classical  dual  logic.  In  spite  of  the  fact  that  Prof.  Zadeh  when  publishing 
his  first  contribution  had  already  some  applications  in  mind  Fuzzy  Set  Theory  for  several  reasons  kept 
inside  the  academic  sphere  for  more  than  20  years.  During  these  20  years  most  of  the  basic  concepts 
which  are  nowadays  used  very  successfully  have  already  been  invented.  Starting  at  the  beginning  of  the 
80s  Japan  was  the  leader  in  using  a  smaller  part  of  Fuzzy  Set  Theory  -  namely  fuzzy  control  -  for  practical 
applications.  Particularly  improved  consumer  goods  such  as  video  cameras  with  fuzzy  stabilizers,  washing 
machines  including  fuzzy  control,  rice-cookers  etc.  caught  the  interest  of  the  media  which  led  around 
1989/1990  to  the  first  "fuzzy  boom"  in  Germany.  Many  attractive  practical  applications  -  not  so  much  in 
the  area  of  consumer  goods  but  rather  in  automation  and  industrial  control  -  led  to  the  insight  that  the 
efficient  and  affordable  use  of  this  approach  could  only  be  achieved  via  CASE-tools.  Hence,  since  the  late 
80s  a  large  number  of  very  user-friendly  tools  for  fuzzy  control,  fuzzy  expert  systems,  fuzzy  data  analysis 
etc.  has  emerged.  This  really  changed  the  character  of  this  area  and  started  to  my  mind  the  area  of'Fuzzy 
Technology".  The  next  -  and  so  far  the  last  -  large  step  in  the  development  occurred  in  1992  when  almost 
independently  in  Europe,  Japan  and  the  USA  the  three  areas  of  Fuzzy  Technology,  artificial  neural  nets 
and  genetic  algorithms  joined  forces  under  the  title  of  "computational  intelligence"  or  "soft 
computing".  The  synergies  which  were  possible  between  these  three  areas  have  been  exploited  since  very 
successfully.  The  figure  1  shows  these  developments  as  a  summary. 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


46 


Survey  of  Evolution 


Academic 

Stage 


T 

Consolidation 
and  Integration 
Area  of  Intelligent 
Systems 


Theory  and  Methods 


Applications 


Transf. 

Stage 

1 


Fuzzy 

Booms 


Fuzzy  Sets 

Fuzzy  Decision 
Fuzzy  Linear  Programing 
Fuzzy  Control 
Linguistic  Variables  . 
Fu^y  Measures  ||  . 

f..,-  ■' 

Fuzzy  Clustering 

,> 

,?'< ' "  ’  ^  ^ 

>'  '  "4  tpf 

fcS&k 

i  0*  ffyp  j  ■* ' 

Fuzzy  Neuri 
Genetic  Algol 
Evolutionary 
Soft  Computin 
Computing 
Computational 


Fuzzy  Control 

(Cement  Kiln) 


Fuzzy  Subway  (Sendai) 
Fuzzy  Video-Recorder 
Fuzzy  Wfetshing-Machines 

Control  of: 

Brakesystems 

Cranes 

Purification  Plants 
Heatingsystems 

Fuzzy  Data  Analysis: 
Chemical  Industries 
Quality  Control 
Customer  Segmentation 


Tools 


1965 


1975 


1  Fuzzy  Chip 
Fuzzy  C 
TIL-Shell 
Fuzzy  Tech 
1.  Fuzzy-Neuro  Chip 


1985 


Fuzzy  SPS 
DataEngine 


Fig.  1.  From  Fuzzy  Set  Theory  to  computational  intelligence 

Management,  engineering  and  other  areas  can  be  supported  by  computational  intelligence  in  many  ways. 
This  support  can  refer  to  information  processing  as  well  as  to  data  mining,  choice  or  evaluation  activities  or 
to  other  types  of  optimization.  Classical  decision  support  systems  consist  of  data  bank  systems  for  the 
information  processing  part  and  algorithms  for  the  optimization  part.  If,  however,  efficient  algorithms  are 
unavailable  or  if  decisions  must  be  made  in  ill-structured  environments,  knowledge-based  components  are 
added  to  supplement  or  substitute  algorithms.  In  both  cases  Fuzzy  Technology  can  be  useful.  In  this  context 
it  may  be  useful  to  cite  and  comment  on  the  major  goals  of  this  technology  and  to  correct  the  very  common 
view  that  Fuzzy  Set  Theory  or  Fuzzy  Technology  is  exclusively  or  primarily  useful  to  model  uncertainty: 

a)  Modeling  of  uncertainty 

This  is  certainly  the  best  known  and  oldest  goal.  I  am  not  sure,  however,  whether  it  can  (still)  be 
considered  to  be  the  most  important  goal  of  Fuzzy  Set  Theory.  Uncertainty  has  been  a  very  important 
topic  for  several  centuries.  There  are  numerous  methods  and  theories  which  claim  to  be  the  only  proper 
tool  to  model  uncertainties.  In  general,  however,  they  do  not  even  define  sufficiently  or  only  in  a  very 
specific  and  limited  sense  what  is  meant  by  "uncertainty".  I  believe  that  uncertainty,  if  considered  as  a 
subjective  phenomenon,  can  and  ought  to  be  modeled  by  very  different  theories,  depending  on  other 
causes  of  uncertainty,  the  type  and  quantity  of  available  information,  the  requirements  of  the  observer  etc. 
In  this  sense  Fuzzy  Set  Theory  is  certainly  also  one  of  the  theories  which  can  be  used  to  model  specific 
types  of  uncertainty  under  specific  types  of  circumstances.  It  might  then  compete  with  other  theories,  but 
it  might  also  be  the  most  appropriate  way  to  model  this  phenomenon  for  well-specified  situations.  It  would 
certainly  exceed  the  scope  of  this  article  to  discuss  this  question  in  detail  here  [6]. 


b)  Relaxation 

Classical  models  and  methods  are  normally  based  on  dual  logic.  They,  therefore,  distinguish  between 
feasible  and  infeasible,  belonging  to  a  cluster  or  not,  optimal  or  suboptimal  etc.  Often  this  view  does  not 
capture  reality  adequately.  Fuzzy  Set  Theory  has  been  used  extensively  to  relax  or  generalize  classical 


47 


methods  from  a  dichotomous  to  a  gradual  character.  Examples  of  this  are  fuzzy  mathematical 
programming  [5],  fuzzy  clustering  [2],  fuzzy  Petri  Nets  [3],  fuzzy  multi  criteria  analysis  [4]. 

c)  Compactification 

Due  to  the  limited  capacity  of  the  human  short  term  memory  or  of  technical  systems  it  is  often  not 
possible  to  either  store  all  relevant  data,  or  to  present  masses  of  data  to  a  human  observer  in  such  a  way, 
that  he  or  she  can  perceive  the  information  contained  in  these  data.  Fuzzy  Technology  has  been  used  to 
reduce  the  complexity  of  data  to  an  acceptably  degree  usually  either  via  linguistic  variables  or  via  fuzzy 
data  analysis  (fuzzy  clustering  etc.). 

d)  Meaning  Preserving  Reasoning 

Expert  System  Technology  has  already  been  used  since  two  decades  and  has  led  in  many  cases  to 
disappointment.  One  of  the  reasons  for  this  might  be,  that  expert  systems  in  their  inference  engines,  when 
they  are  based  on  dual  logic,  perform  symbol  processing  (truth  values  true  or  false)  rather  than  knowledge 
processing.  In  Approximate  Reasoning  meanings  are  attached  to  words  and  sentences  via  linguistic 
variables.  Inference  engines  then  have  to  be  able  to  process  meaningful  linguistic  expressions,  rather  than 
symbols,  and  arrive  at  membership  functions  of  fuzzy  sets,  which  can  then  be  retranslated  into  words  and 
sentences  via  linguistic  approximation. 

e)  Efficient  Determination  of  Approximate  Solutions 

Already  in  the  70s  Prof.  Zadeh  expressed  his  intention  to  have  Fuzzy  Set  Theory  considered  as  a  tool  to 
determine  approximate  solutions  of  real  problems  in  an  efficient  or  affordable  way.  This  goal  has  never 
really  been  achieved  successfully.  In  the  recent  past,  however,  cases  have  become  known  which  are  very 
good  examples  for  this  goal.  Bardossy  [1],  for  instance,  showed  in  the  context  of  water  flow  modeling  that 
it  can  be  much  more  efficient  to  use  fuzzy  rule  based  systems  to  solve  the  problems  than  systems  of 
differential  equations.  Comparing  the  results  achieved  by  these  two  alternative  approaches  showed  that  the 
accuracy  of  the  results  was  almost  the  same  for  all  practical  purposes.  This  is  particularly  true  if  one 
considers  the  inaccuracies  and  uncertainties  contained  in  the  input  data. 

The  development  of  Fuzzy  Technology  during  the  last  30  years  has,  roughly  speaking,  led  to  the  following 
application  oriented  classes  of  approaches: 

-  Model-based  (algorithmic)  Applications 

•  fuzzy  optimization  (fuzzy  linear  progr.  etc.) 

•  fuzzy  clustering  (hierarchical  and  obj.  function) 

•  fuzzy  Petri  Nets 

•  fuzzy  multi  criteria  analysis 

-  Knowledge-based  Applications 

•  fuzzy  expert  systems 

•  fuzzy  control 

•  fuzzy  data  analysis 

-  Information  Processing 

•  fuzzy  data  banks  and  query  languages 

•  fuzzy  programming  languages 

•  fuzzy  library  systems. 

For  almost  all  classes  of  application  mentioned  above  tools  (software  and/or  hardware)  are  available  to 
allow  efficient  modeling.  Institutionally  Fuzzy  Set  Theory  developed  very  differently  in  the  different  areas 
of  the  world.  The  first  European  Working  Group  for  Fuzzy  Sets  was  started  in  1975,  at  a  time  at  which 
Fuzzy  Sets  became  visible  in  international  conferences,  such  as  NOAK  (Scandinavian  Operations  Research 
Conference,  IFORS-Conference  in  Toronto,  and  the  1st  USA-Japan  Symposium  in  Berkeley. 

At  the  beginning  of  the  80s,  national  societies  were  founded  in  the  USA  (NAFIPS)  and  Japan  (Soft)  and 
almost  at  the  same  time  a  worldwide  society  IFSA  was  started. 


48 


When  the  3rd  World  Congress  of  IFSA  took  place  in  Tokyo,  Fuzzy  Technology  was  already  well-known  in 
the  Japanese  economy  where  it  had  been  successfully  applied  to  consumer  goods  (washing  machines,  video 
cameras,  rice  cookers)  but  also  to  the  industrial  processes  (cranes  etc.)  and  to  public  transportation  (subway 
system  in  Sendai).  In  the  rest  of  the  world  it  was  still  very  little  known  and  primarily  considered  as  an 
academic  area. 


ServiceCenter 


Steering  Committee 


nodes 


Technology  Developments  Committee 


The  European  Development 

By  contrast  to  Japan  and  the  USA  Europe  is  very  heterogeneous  economically,  culturally  and  scientifically. 
When  in  1989/90  the  "Fuzzy  Boom"  was  triggered  by  the  media,  that  had  observed  the  fast  development  of 
this  technology  in  Japan,  there  existed  in  different  European  countries  approximately  ten  research  groups  in 
the  area  of  Fuzzy  Sets  but  they  hardly  communicated  with  each  other,  even  hardly  knew  of  each  other. 
They  were  working  on  an  international  level  but  were  not  very  application-oriented. 


In  this  situation  the  fear  grew  that  Europe  would  again  lose  one  of  the  major  market  potentials  to  Japan. 
What  seemed  to  be  needed  most  was  communication  and  cooperation  between  European  countries  and 
between  science  and  economy.  Neither  a  company  nor  a  university  seemed  to  have  the  standing  to  bring 
this  about.  Hence,  a  foundation  (ELITE  =  European  Laboratory  for  Intelligent  Techniques  Engineering) 
was  founded.  It  was  much  smaller  and  had  much  less  public  support  than  LIFE  in  Japan,  which  had  very 
similar  objectives.  The  Media  and  the  strong  public  interest  had  strong  influences  on  the  universities  and 
within  one  to  two  years  the  European  Commission  could  be  convinced  of  the  economic  importance  of  this 
area.  Via  a  European  Working  Group  on  Fuzzy  Control  one  of  the  European  Networks  of  Excellence  was 
dedicated  to  Fuzzy  Technology  (ERUDIT).  It  became  a  European  framework  in  which  new  theoretical  and 
practical  developments  were  and  are  methodically  and  interdisciplinary  triggered,  supported  and  advanced. 
Some  of  its  important  features  are 

-  its  structure, 

-  its  growth, 

-  its  orientation,  and 

-  its  services. 

These  are  depicted  in  the  following  figures.  The  structure  is  a  matrix  organization  with  the  functions  and 
methods  horizontally  intersecting  all  sectors  of  the  economy. 


Fig.  2.  ERUDIT  -  Structure 


49 


Fig.  3.  Committee  Structure. 


Its  growth  and  orientation  are  shown  in  Figure  4.  A  self-imposed  constraint  ensures  the  application 
orientation  and  focussed  activities  in  several  directions  lead  to  a  steady  growth. 


li-iss  u  .«*  jjjrr  kn.od 

fl.1.05 

txw  1.1,0’  aw.r?j 

Active  Nodes 

Pending 

I W  |i 

Active 

1 

l||||| 

Nodes 

mp;  2 

1  .”  V  •  •  : 

|jJK*23 

1208  1519  1584 

1260  1414  1436 

'  ^ ." '  : 
f  f  §§j  1 

f  |  ' 

C  *. {  j 

Total  Nodes 

s  m  m  ■  90 

89  [m 

Fig.  4.  Composition  and 

growth  of  ERUDIT 

||S  20  9 

■  '  x  'K:  j-.V 


29333020 


Services  offered  are  geared  to  the  requests  of  the  nodes  as  shown  in  Figure  5. 


Fig.  5.  Requested  Services 


Extensive  surveys  also  allow  very  focussed  activities  to  advance  the  area  in  scope  and  depth.  Figures  6  and 
7  sketch  results  from  these  surveys. 


At  present  a  merger  of  the  Networks  of  Excellence  for  Fuzzy  Sets,  Neural  Nets,  Evolutionary  Computing 
and  Machine  Learning  is  being  implemented  and  a  very  extensive  survey  is  to  be  executed.  The  results  of 
this  will  be  ready  by  June  1999  and  will  be  reported  at  IPM'99  in  Hawaii  in  July  1999. 


Fig.  7.  Application  Areas. 


Which  conclusions  can  be  drawn  from  the  European  experience  described  above?  Maybe  that  a  strong  and 
steady  growth  of  a  technology,  even  in  difficult  conditions  as  in  Europe,  can  be  achieved  if  the  media  are 
intensively  included  in  the  promoting  activities  and  if  the  development  is  not  left  to  chance  but  if 
communication,  initialization,  support  and  technology  transfer  are  improved  systematically  and  steadily. 


REFERENCES 

1 .  A.  Bardossy,  1996.  The  Use  of  Fuzzy  Rules  for  the  Description  of  Elements  of  the  Hydrological  Cycle 
Ecological  Modelling,  85,  59  -  65. 

2.  J.  C.  Bezdek  and  S.  K.  Pal,  1992.  Fuzzy  Models  for  Pattern  Recognition.  New  York. 

3.  H.-P.  Lipp,  R.  Gunther  and  P.  Sonntag,  1989.  Unscharfe  Petri  Netze  -  Ein  Basiskonzept  fur 
Computerunterstiitzte  Entscheidungssysteme  in  Komplexen  Systemen  Wissenschaftliche  Schriftenreihe 
der  TU  Chemnitz,  7. 

4.  H.-J.  Zimmermann,  1986.  Multi  Criteria  Decision  Making  in  Crisp  and  Fuzzy  Environments,  in: 
Zimmermann,  Jones  and  Kaufman  (edtrs.).  Fuzzy  Set  Theory  and  Applications.  Dodrecht,  233  -  256. 

5.  H.-J.  Zimmermann,  1996.  Fuzzy  Set  Theory  -  and  Its  Applications.  3rd  rev.  edit.  Boston. 

6.  H.-J.  Zimmermann,  1997.  A  Fresh  Perspective  on  Uncertainty  Modeling:  Uncertainty  vs.  Uncertainty 
Modeling,  in:  B.  M.  Ayyub  and  M.  M.  Gupta  (edtrs.).  Uncertainty  Analysis  in  Engineering  and 
Sciences:  Fuzzy  Logic,  Statistics,  and  Neural  Network  Approach  .International  Series  in  Intelligent 
Technologies,  Kluwer  Academic  Publishers,.  353  -  364. 


52 


53 


Teleminingtm  Systems  Applied  to  Underground 
Hard  Rock  Metal  Mining  at  Inco  Limited 

Gregory  R.  Baiden 

INCO  Mines  Research,  Sudbury,  Ontario,  Canada 
Email:  gbaiden@inco.com 


ABSTRACT 

The  introduction  of  intelligent  systems  to  the  underground  mining  of  nickel,  copper,  cobalt  and  precious 
metals  can  provide  significant  improvement  in  the  performance  of  a  mine  when  compared  with 
conventional  mining  techniques.  This  paper  presents  the  concepts  of  Telemining  in  an  actual  case  study 
showing  how  the  systems  work  and  the  competitive  advantage  they  provide  for  Inco’s  underground  metal 
mining  operations  today  and  into  the  future. 


INTRODUCTION 

Inco  Limited  is  a  metal  processing  company  that  produces  1 7  minerals  and  chemicals  from  raw  ore  in  the 
form  of  sulphides  and  laterites  deposits.  Sulphide  deposits  have  enjoyed  a  competitive  advantage  over 
laterites  to  date  primarily  due  to  the  cost  of  metal  processing.  Recent  advances  in  the  processing  of  laterite 
deposits  have  put  considerable  pressure  on  sulphide  producers  to  lower  costs  to  be  competitive.  The  main 
different  between  sulphide  producers  and  laterite  producers  is  the  cost  of  mining.  Laterite  producers  enjoy 
low  mining  costs  due  to  the  utilization  of  open  pit  mining  techniques  while  underground  sulphide  producers 
have  higher  mining  costs  because  of  underground  extraction  techniques.  This  paper  describes  Telemining 
techniques  that  can  potentially  close  the  gap  between  the  mining  costs  of  laterite  and  sulphide  producers. 


TELEMININGtm 

Telemining  is  the  application  of  remote  sensing  and  remote  control  of  mining  equipment  and  systems.  The 
main  technical  ingredients  are: 

•  Advanced  underground  mobile  computer  networks 

•  Underground  positioning  systems 

•  Mining  process  monitoring  and  control  software  systems 

•  Mining  methods  designed  specifically  for  Telemining 

•  Advanced  mining  equipment 
These  ingredients  are  shown  in  Figure  1 . 


Fig.  1.  Core  Technologies  for  Telemining. 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


54 


Advanced  underground  mobile  computer  networks  allow  the  mining  process  to  be  connected  to  Operations 
Centers  for  the  running  of  the  mining  process.  Inco,  in  conjunction  with  IBM  and  Ainsworth  Electric, 
developed  this  advanced  mobile  computer  network  in  the  early  1990s.  It  consists  of  a  high  capacity  CATV 
network  linked  to  radio  cells  that  are  located  in  central  areas  in  the  levels  of  the  mine.  The  capacity  of  the 
system  was  designed  to  provide  2.4  GHz  of  bandwidth  allowing  operation  of  mobile  telephones,  handheld 
computers,  mobile  computers  on  board  machines  and  multiple  video  channels  to  run  multiple  pieces  of 
mining  equipment.  These  systems  have  the  capacity  to  fulfill  all  the  needs  of  an  individual  mine  for 
operation  from  a  surface  control  room. 


PLC 


Fig.  2.  Underground  Telecommunication  System. 

Underground  positioning  systems  have  been  established  that  provide  enough  accuracy  to  locate  the  mobile 
equipment  in  real-time.  The  ability  to  perform  this  function  provides  a  number  of  practical  uses  including 
machines  set  up,  hole  location  and  remote  surveying.  The  systems  that  have  been  developed  function 
similar  to  Global  Positioning  Systems  (GPS).  The  positioning  equipment  used  consists  of  a  Ring-Laser- 
Gyro.  These  units  will  be  mounted  on  all  drilling  machines  that  so  surface  operators  can  position  the 
equipment  without  needing  to  go  underground.  The  test  bed  machine  for  trying  these  systems  is  shown 
below.  This  unit  is  capable  of  surveying  a  1km  drift  (tunnel)  in  a  few  hours  while  current  work  practices 
require  several  days.  The  output  of  this  work  is  the  ability  to  develop  ‘Virtual  reality”  as-builds  of  the  mine. 


Fig.  3.  Underground  Positioning  Unit  and  Surveying  Machine. 


Mine  Planning,  Simulation  and  Process  Control  Systems  are  the  next  advance  to  be  made.  The  linking  of 
engineering  directly  with  operations  is  key  to  the  proper  application  of  telemining.  Present  mine  planning 
systems  provide  visualization  of  the  mineralization  available  for  mining.  It  is  not  until  the  method  has  been 
simulated  that  a  true  idea  of  the  output  of  an  operation  in  a  given  timeframe  can  occur.  Since  the  speed  of 
planning  and  simulation  has  now  been  accelerated,  an  interative  approach  can  now  be  less  onerous  thus 
freeing  time  for  optimizing  the  plan.  The  output  of  this  high  level  plan  now  becomes  the  input  for  a  more 
detailed  plan  using  an  MRP  III  (Manufacturing  Resource  Planning)  system.  This  establishes  schedules  for 
individual  work  tasks  that  are  fed  directly  to  the  mining  machines  connected  over  the  network.  Feedback  is 
done  with  a  base  data  collection  system  residing  in  a  Spatial  Database. 


55 


The  technologies  discussed  here  will  be  applied  to  the  processes  of  exploration,  drifting  and  stoping.  These 
process  systems  will  allow  the  teleremote  capability  of  mining  machinery.  The  main  process  machinery 
will  be  diamond  drills,  drifting  drills,  explosives  loading  machines,  Load-Haul -Dump  machines  and  ground 
support  units. 


RESEARCH  MINE 

Let  us  now  use  the  Inco  Limited  Research  Mine  -  175  ore  body  as  a  case  study  to  describe  how  this  plan 
will  work.  Key  to  improving  the  value  of  a  mineralized  area  is  the  speed  with  which  we  mine  the  ore  and 
the  cost  of  doing  the  actual  work.  By  far  the  driving  factor  to  increase  mine  value  is  the  reduction  of  cycle 
time.  Given  that  the  175  area  has  a  total  mineralization  of  approximately  8  million  tons  (see  Figure  4) 
grading  0.5  %Cu  and  %Ni  combined,  the  challenge  is  to  use  the  techniques  to  bring  the  ore  production 
forward. 


Fig.  4.  Research  Mine  Mineralization. 


The  process  of  mining  consists  of  four  overall  components:  delineation,  development,  production  and 
materials  handling.  Delineation  is  the  continuous  proving  of  the  ore  grade  quality  to  locate  the  closed 
mineral  resources.  Delineation  today  consists  of  a  geologist  making  a  request  for  tonnage  and  grade 
information.  This  request  is  sent  to  a  diamond  drilling  and/or  geologic  probing  group.  In  this  group  the 
reduction  in  cycle  through  surface  operation  allows  on-line  information  to  be  available  for  faster 
turnaround  times  in  the  continuous  planning  and  redefinition  of  the  ore  body  to  dynamically  alter  planning 
where  possible.  Today  turnaround  times  on  information  are  about  3  months.  Surface  teledelineation  with 
probes  will  reduce  this  cycle  time  to  minutes  through  introduction  of  Spatial  Information  Systems  linked  to 
teledrilling  machines  that  probe  as  they  go. 

Today  development  times  are  typically  24  hours  for  total  cycle  in  our  industry.  This  is  using  conventional 
tools  such  as  the  2-boom  electric/hydraulic  jumbo,  hand  explosives  loading,  mucking  and  ground  support. 
Teledevelopment  will  provide  a  compressed  development  cycle  time,  as  the  available  time  for  work  will  be 
increased  by  30%.  This  will  occur  through  teleoperation  of  a  computer  controlled  telejumbo,  a  computer 
controlled  explosives  loader,  a  teleoperated  LHD  and  the  use  of  more  expensive  but  faster-setting  ground 
support  coatings.  Synchronization  of  the  development  cycle  will  provide  a  100%  improvement  in  the  rate 
of  development  while  the  utilization  of  people  will  be  enhanced  through  on-surface  teleoperation  of  the 
machinery.  A  100%  improvement  in  rate  or  a  doubling,  significantly  changes  the  cash  flow  of  the  entire 
mine.  The  can  be  seen  in  the  results  of  the  simulator  comparison  below,  based  on  projections  for  the 
teledevelopment  process  under  study. 

Production  cycle  time  is  less  significant  to  the  cash  flow  than  is  speeding  up  development  but  each 
reduction  in  cycle  times  will  help.  The  real  challenge  is  to  control  drilling  and  blasting  so  as  to  avoid 
slowing  the  process.  Today  we  use  longhole  drills,  explosives  loaders  (typically  AN/FO)  and  LHDs 
followed  by  backfilling  with  rock  or  tailings,  if  required.  Teleproduction  at  the  research  mine  will  be  done 
with  Tamrock  Datasolo  drills  specifically  designed  for  reliability  and  teleoperation,  Dyno  Nobel’s  latest 
emulsion  explosive  loader  using  microprocessor-based  caps,  and  LHDs  and  Trucks  for  material  handling. 


56 


Simulation  of  the  research  mine  shows  that  with  blasting  techniques  such  as  those  described,  the  mine  is 
able  to  process  under-grade  ore  profitability  since  it  can  be  mined  faster,  thus  increasing  its  value. 

The  keys  to  synchronizing  the  process  are  software  tools  that  become  the  mine  control  system.  Just-in-time 
techniques  that  balance  mine  development  and  stope  inventory  against  the  acceleration  of  the  mining  rate 
provide  a  cost  savings  while  still  achieving  value  enhancement  by  reducing  the  overall  mine  cycle  time.  A 
KANBAN  production  order  processing  system  with  direct  links  to  the  mining  machines,  will  improve  the 
efficiency  of  mining  well  in  excess  of  the  results  already  demonstrated  by  manufacturers  such  as  Toyota. 


FIELD  TEST  WORK 

Initial  test  work  at  Inco  has  been  completed  in  the  processes  of  delineation,  development  and  production. 
Preliminary  results  are  shown  in  the  following  examples  including  Teledelineation,  Teledevelopment  and 
Teleproduction. 

Teledelineation  combines  diamond  drilling  technology  with  teleremote  operation  and  on-line  grade 
collection.  Work  todate  has  seen  process  automation  of  the  drilling  together  with  rod  changing,  a  prototype 
SIS  and  new  infrared  probes.  A  conceptual  picture  is  shown  in  Figure  5.  The  work  completed  in  machine 
reliability  todate  significantly  enhances  the  reliability  of  the  drill  through  addition  of  computer  control 
packages  and  a  rod-handling  package.  Control  improvements  have  reduced  consumables.  Todate  our 
testing  shows: 

•  bit  life  has  improved  on  average  by  300%  with  some  results  as  high  as  400%. 

•  rod  life  has  improved  by  100%. 

Both  results  exhibit  better  consistency  and  reliability  that  is  absolutely  necessary  for  teleoperation.  Spatial 
database  development  has  been  a  key  ingredient  to  compressing  cycle  times.  Probing  results  can  go  directly 
into  the  database  where  geologists  and  engineering  have  the  latest  information  to  use  in  planning  all  of  the 
time.  Database  development  has  been  an  off-shoot  of  oceanographic  work.  A  drill  hole  database  that  links 
directly  to  mine  planning  systems  is  now  available  today. 


Fig.  5.  Delineation  Drilling  System  combined  with  a  Real-Time  Software  System. 


Teledevelopment  comprises  four  machines  at  present.  These  include  a  teleremote  horizontal-drilling 
machine,  an  explosives-loading  unit,  a  materials-handling  machine  (LHD)  and  a  ground-support  machine. 
The  drilling  machine  requires  a  change  in  the  drilling  pattern  to  provide  consistency.  New  patterns  have 
been  developed  to  improve  the  reliability  of  the  tunneling  process  as  part  of  the  teledevelopment  strategy. 


57 


The  computerized  horizontal  drill  (jumbo)  has  a  computer  system  that  is  linked  to  the  underground  network 
to  provide  on-line  engineering  information  to  drill  the  pattern.  The  underground  positioning  system 
provides  location  coordinates  of  a  machine  for  rapid  setup  and  accurate  drilling.  These  types  of  control 
systems  are  a  must  in  teleremote.  Automated  explosives  loaders  are  beginning  to  be  developed  and 
teleoperated  LHDs  are  working  in  the  field  achieving  significant  production  improvements  today. 

Teleproduction  combines  three  machines  that  must  interact  in  stoping.  These  units  are  a  longhole 
production  drill,  explosive  loader  and  an  LHD.  The  machines  running  teleremotely  will  reduce  cycle  times 
and  cost.  Today,  testing  is  underway  on  the  drilling  and  LHDs  for  production.  The  results  of  this  work  are 
quite  promising.  Drilling  performance  has  seen  teledrilling  with  3  drills  and  3  operators  with  3  rovers 
achieve  the  same  production  as  a  conventional  mining  scenario  of  5  drills  with  20  operators.  These 
improvements  remove  2  drills  and  reduce  the  need  for  14  operators.  As  well,  these  drills  operate  for  7.5 
hours  of  a  current  8-hour  shift  in  comparison  to  manual  drills  that  only  run  5  hours  for  the  same  8-hour 
shift.  The  LHD  performances  are  similar.  The  LHDs  are  moving  material  at  the  same  rate  as  current  LHDs 
with  highlights  of  a  single  machine  operating  23  hours  out  of  a  24  hour  day  in  comparison  to  a  normal 
production  unit  that  would  work  1 5  hours  out  of  a  24  hour  day. 


MINE  OPERATION  CENTER 

A  Mine  Operation  Center  connected  to  multiple  mines  offers  the  opportunity  to  enhance  further  the 
utilization  of  people  and  equipment.  An  MOC  connects  to  the  individual  headframes  allowing  control  of 
geographically  dispersed  equipment.  An  example  that  is  currently  running  as  a  prototype  at  Inco  is  shown 
in  Figure  6.  This  prototype  has  been  put  in  place  to  test  the  technical  and  financial  feasibility.  Some  of  the 
questions  being  answered  are: 

•  How  many  pieces  of  machinery  can  a  single  operator  run? 

•  What  are  the  logistics  issues  of  operating  this  way? 

•  How  should  the  Operations  Center  be  organized? 


Mine  Operations  Centre  to  Multiple  Minesites 

Creighton  Control 


Prototype  running  in  1999 


Fig.  6.  Mine  Operations  Center  Concept. 


BENEFITS 

The  main  benefits  of  this  style  of  operation  are  improvements  in  safety,  productivity,  value-added  time  and 
cycle  time.  Safety  improves  since  an  operator  has  less  time  underground.  Productivity  shifts  from  one 
person  per  machine  to  one  person  per  three  machines.  Value-added  time  has  been  demonstrated  by  having 


58 


23  continuous  hours  of  operation  in  a  24-hour  period.  These  results  reduce  the  time  of  the  development  and 
production  cycles. 


To  determine  these  benefits,  several  telemining  simulators  of  the  Research  Mine  have  been  created.  These 
simulators  compare  the  timing  of  conventional  mining  processes  to  telemining  for  the  Sub-Level  Retreat 
mining  method  (see  Figure  7).  The  benefits  of  this  style  of  underground  mine  operation  are  significant. 
Recent  results  show  improvements  in  project  life  reduction  of  one  year  and  a  change  in  net  cash  flow  of  $3 
million  or  -25%  to  +  30%  (see  Figure  8)  for  telemining  over  conventional  mining  techniques.  Taken  one 
step  further  to  a  central  Mine  Operation  Center  (MOC)  that  runs  three  equivalent  mines  indicates  returns 
and  cash  flow  changes  of +130%  and  +$35  million. 


i 

i 

;  $5o,ooo 

1 

I  $40,000 

$30,000 

I 

$20,000 

I  $10,000 

$0 

I 

!-$  10,000 

I 

| -$20,000 
!  -$30,000 


Net  Cash  Flow  ($00) 


Years 


“Manual 


“Tele-Ops 


-moc  : 


Fig.  7.  Research  Mine  Sublevel  Retreat  Design. 


Fig.  8.  Financial  Comparison  of  Operating  Methods. 


SUMMARY 

Telemining  is  a  direction  in  which  underground  mining  companies  must  head  if  they  intend  to  continue 
operating  in  the  ever-tightening  base-metal  commodity  business.  The  techniques  shown  in  this  paper  are 
similar  to  Computer  Integrated  Manufacturing  (CIM)  methods  that  provide  a  consumer  with  a  continuous 
supply  of  cheap  products.  Application  of  these  advanced  manufacturing  techniques  to  mining  will  allow 
sulphide  deposits  to  significantly  reduce  production  costs  to  remain  competitive  with  laterite  deposits. 


59 


Soft  Sensors  for  Processing  Plants 

G.D.  Gonzalez 

Department  of  Electrical  Engineering,  University  of  Chile, 
Santiago,  Chile 


ABSTRACT 

Soft-sensors  assist  in  solving  the  problem  created  by  the  unavailability  of  a  sensor  by  providing  a  software 
backup  for  it,  thus  allowing  a  reduction  of  losses  in  plant  performance.  Similarly,  the  use  of  a  soft  sensor 
to  estimate  a  plant  variable  for  which  no  sensor  is  installed,  provides  an  occasion  for  improving  the 
performance  of  a  plant.  The  core  of  a  soft-sensor  is  a  partial  model  of  a  plant  allowing  the  generation  of  a 
estimated  measurement  to  replace  missing  actual  measurements.  Coupled  with  the  model  there  is  a 
problem  of  signal  estimation,  interpolation  and  prediction.  Modes  considered  here  are  black  box  models 
(e.g.,  ARMAX,  NARMAX,  Neural  Network,  Cluster  Analysis)  and  gray  models  which  include 
phenomenological  knowledge.  Also,  comparisons  are  made  concerning  the  requirements  for  soft-sensor 
models  and  for  models  used  in  model  based  control.  An  approach  to  an  integrated  view  of  the  various  soft- 
sensor  modeling  methods  is  attempted.  Among  other  aspects  considered  are:  (i)  how  a  soft-sensor  may  be 
used  for  interpolation  and  prediction  of  measurements  having  a  sampling  rate  which  is  too  low,  (ii) 
performance  of  a  control  loop  when  a  sensor  is  replaced  by  a  soft-sensor,  (iii)  performance  of  the  soft- 
sensor  indication  in  the  period  following  the  removal  of  the  actual  sensor  by  a  soft-sensor  as  plant 
characteristics  change  (iv)  complementary  considerations  for  ensuring  the  availability  of  soft-sensors  in 
industrial  environments  and  (v)  problems  related  to  the  use  of  soft-sensors  in  automatic  control  loops.  A 
review  made  of  a  sample  of  the  technical  literature  on  applications  as  well  as  of  research  and  development 
in  this  field,  is  commented  in  the  text  and  summarized  in  a  table. 

INTRODUCTION 

Sensors  are  the  eyes  through  which  the  behavior  of  a  plant  is  seen  and  information  about  its  performance  is 
obtained.  But  the  measurements  from  a  sensor  may  become  unavailable  because  of  failure,  removal  for 
maintenance  or  repairs.  Also,  it  may  happen  that  the  sampling  is  infrequent,  i.e.,  the  plant  variable  to  be 
measured  is  not  sampled  at  a  fast  enough  rate,  so  that  the  samples  do  not  represent  the  actual  evolution  of 
such  variable,  because  of  a  violation  of  Shannon^  sampling  theorem  [1].  This  happens  when  a  sensor  is 
time  shared  for  performing  measurements  at  different  points.  It  is  also  the  case  when  no  sensor  is  installed 
(e.g.,  because  it  has  not  yet  been  developed  or  placed  in  the  market,  or  because  of  its  high  cost)  and 
laboratory  analyses  are  required.  In  addition,  it  may  be  that  a  sensor  is  not  very  robust  so  that,  instead  of 
having  it  always  on-line,  it  is  used  to  take  infrequent  samples  [2],  Then  due  to  the  fact  that  a  set  of 
measurements  is  totally  missing  or  incomplete  during  a  certain  period  of  time,  the  plant  performance  will 
almost  surely  be  impaired.  Soft-sensors  may  provide  a  convenient  solution  to  eliminate  or  at  least  palliate 
this  problem  [1,  3,  4,  5,  6,  7,  8],  even  in  cases  when  the  sampling  rate  is  inadequate,  if  certain  conditions 
are  met  [2,  9,  10,  1 1,  12]  (see  Table  1  below). 

In  general  soft-sensors  supply  the  estimation  of  the  missing  measurements  by  using  a  model  that  relates  the 
corresponding  variable  with  other  measurements  that  are  correlated  with  it  (Fig.  1).  The  fact  that  the  model 
is  implemented  by  means  of  software  (hence  the  name  soft(wavs)-semor)  means  that  soft-sensors  provide  a 
software  back-up  for  unavailable  sensors,  as  an  alternative  to  a  hardware  back-up  using  spare  sensors. 

Two  are  the  main  topics  related  with  the  problem  of  maintaining  the  availability  of  measurements  in  a  plant 
by  means  of  soft-sensors  : 

i)  The  soft-sensor  model,  and 

ii)  The  detection  that  a  sensor  has  failed  together  with  the  identification  of  which  is  the  failing 
sensor  out  of  a  set  of  sensors. 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


60 


Fig.  1.  Soft-sensor  system  replacing  failed  sensor 
with  model  having  correlated  secondary 
measurements  as  inputs. 


Concerning  the  models,  some  authors  define  soft-sensors  only  on  the  basis  of  neural  network  models  (e.g., 
[12]),  but  a  broader  point  of  view  should  be  adopted,  since  soft-sensor  models  are  also  obtained  using 
regression  or  correlation  techniques,  as  well  as  fuzzy  logic  or  first  principles  models,  or  combinations  of 
them.  But  developments  seem  to  have  followed  independent  paths,  and  as  a  result,  different  names  are 
given  to  the  same  concepts,  e.g.:  parameters  =  weights,  parameter  estimation  =  training,  new  data  tests  = 
generalization 

Other  subjects  of  interest  concerning  soft-sensors  are: 

a)  Performance  of  a  control  loop  when  a  sensor  is  replaced  by  a  soft-sensor  [13,  14], 

b)  Performance  of  the  soft-sensor  indication  in  the  period  following  the  removal  of  the  actual  sensor 
by  a  soft-sensor  as  plant  characteristics  change  [13,  14,  15]. 

c)  Complementary  considerations  for  ensuring  the  availability  of  soft-sensors  in  industrial 
environments  [3, 4,  16,  17]. 

This  paper  deals  mainly  with  the  subject  of  the  soft-sensor  by  itself.  Although  fault  detection  and 
identification  is  an  important  and  interesting  associated  subject,  closely  connected  with  soft-sensor 
modeling  as  witnessed  in  [6,  7,  8],  lack  of  space  prevents  an  adequate  treatment  of  this  matter  here. 


MODELS 

The  core  of  a  soft-sensor  is  a  partial  model  of  a  plant  allowing  the  generation  of  a  virtual  measurement  to 
replace  a  real  sensor  measurement.  Coupled  with  the  model  there  is  a  problem  of  signal  estimation, 
interpolation  and  prediction.  The  model  must  generate  the  missing  measurements  in  terms  of  other 
measured  variables  with  which  these  measurements  are  correlated.  This  process  is  well  stated  paralleling 
the  definition  of  identification  given  by  L.  Zadeh.  Here  a  class  of  models,  a  class  of  input  signals,  and  a 
criterion  are  defined  in  order  to  determine  the  best  model  (Fig.  2).  The  problem  consists  of  finding  which 
model  belonging  to  the  class  of  models  (e.g.,  a  subset  of  ARMAX  models,  a  subset  of  neural  network 
models)  is  best,  according  to  the  chosen  criterion  (usually  defined  in  terms  of  the  mean  square  error 
between  the  model  output  or  prediction  and  the  measured  plant  variable  to  be  represented  by  the  soft- 
sensor),  for  the  specified  class  of  signals.  For  this  purpose  only  measured  input  and  output  variables  of  the 
plant  or  process  may  be  used.  The  modeling  process  depicted  by  Fig.  2,  applies  whether  the  models  be 
based  in  neural  networks,  regression,  fuzzy  logic,  first  principles  (phenomenological),  combinations  of 
them,  etc.  In  general,  models  based  only  on  input  output  measurements  will  be  called  black  box  models, 
those  based  on  first  principles  will  be  termed  phenomenological  models,  while  those  using  not  only  the 
input-output  measurements  but  incorporating  some  phenomenological  knowledge  about  the  plant  will  be 
called  gray  box  models. 


61 


Fig.  2.  Choosing  the  best  model.  Here  u  is  a  vector  of  command  (manipulated)  variables,  v  is 
a  vector  or  measured  disturbances,  p  is  a  vector  of  unmeasured  disturbances,  y  is  the 
plant  output  and  j>  the  model  output.  The  instantaneous  error  is  e(t)  =  y(t)  -  $>(t) . 

It  is  convenient  to  distinguish  two  stages  in  this  process  of  finding  the  best  model: 

•  Structure  determination  and 

•  Parameter  (weight)  estimation 

Both  structure  determination  and  parameter  estimation  should  be  done  with  one  set  of  data  -  the  training 
set  -  and  tested  with  a  different  set:  the  test  set. 

Structure 

The  model  structure  stage  is  concerned  with  finding  a  form  for  the  soft-sensor  model.  For  example,  in  the 
ARMAX  class  of  models  this  entails  finding  the  most  significant  components  correlated  with  the  soft- 
sensor  signal  which  will  be  used  as  inputs  signals  to  the  model..  A  component  is  a  term  in  the  model 
having  an  associated  parameter,  and  may  be  a  measurement  or  an  appropriate  combinations  of 
measurements  (composite  components).  In  the  neural  network  case,  the  structure  comprises  the  type  of 
network,  the  network  topology,  the  most  significant  measurements  to  be  used  as  inputs,  the  number  of 
layers,  the  selection  of  the  activation  functions,  and  the  number  of  nodes  in  each  layer.  Composite 
components  may  also  be  considered  as  suitable  inputs  to  a  neural  network.  A  given  structure  may  be  valid 
for  a  large  operating  region  of  a  process,  while  its  parameters  (weights)  may  need  to  be  updated  as  the 
operating  point  undergoes  small  or  large  changes.  Some  usual  structure  determination  methods  follow. 

Regression  Based  Models 

Model  structures  of  the  ARMAX  [28]  and  NARMAX  [5]  class  are  determined  using  regression 
techniques.  In  order  to  find  the  components  that  determine  the  model  structure  all  the  relevant  information 
is  used,  including  measurements  of  inputs,  internal  plant  variables,  and  measured  disturbances. 
Phenomenological  knowledge  of  the  plant  may  be  of  great  use  [1,  3,  4,  18].  Delays  associated  with  the 
different  measurements  are  included  in  this  determination.  The  structure  determination  consists  then  in 
finding  which  are  the  components  most  correlated  with  the  soft-sensor  variable.  In  order  to  do  this, 
methods  such  as  Piecewise  Regression  may  be  used.  In  this  way  the  model  is  simplified,  since  no  highly 
correlated  components  are  included  in  the  resulting  model.  In  addition  it  turns  out  that  the  standard 
deviation  of  the  error  in  the  determination  of  the  model  parameters  is  reduced.  An  example  of  an  ARMAX 
structure  having  both  kinds  of  components  is  given  in  (1). 

=  0, y(t  - 1)  +  02Xj  (t  - 1)  +  03  —  +  04 Jx7  +  05 

x5 

where  the  secondary  measurements  are  x/t) ,  j  =  1, ...  ,m,  and  the  sampling  times  are  t. 


1. 


62 


Stepwise  Regression.  This  is  one  of  the  methods  used  to  find  which  are  the  most  significant  components 
associated  with  the  soft-sensor  variable.  It  begins  by  selecting  that  component  from  a  list  of  candidate 
components  which  is  most  closely  correlated  with  the  variable  to  be  modeled  ,  i.e.,  the  soft-sensor  variable. 
In  this  way  a  first  partial  model  is  determined.  The  residue  of  this  model  is  then  correlated  with  the 
remainder  candidate  components.  In  each  following  stage,  the  component  to  be  included  is  the  particular 
component  that  gives  the  largest  partial  correlation  with  the  previous  step  residue,  calculated  after  a 
multilinear  regression  is  performed  with  the  previously  selected  components.  With  this  procedure,  the  model 
structure  increases  through  the  addition  of  one  component  at  a  time,  such  that  the  added  component  is  the 
one  that  contributes  the  greatest  improvement  in  the  goodness  of  fit  to  the  model.  Before  each  new 
component  is  included  in  the  model,  though,  the  components  to  be  included  are  tested  for  their  statistical 
significance.  Each  model  parameter  estimate  has  an  estimated  error  standard  deviation  The  ratio  between 
this  standard  deviation  and  the  coefficient  value,  is  used  to  decide  the  inclusion  of  every  new  component.  If 
for  any  component  the  ratio  exceeds  a  given  threshold  then  the  corresponding  component  is  not  included, 
or  excluded  if  it  had  been  previously  included.  This  procedure  is  repeated  until  no  component  from  the  list  of 
candidate  components  is  either  deleted  or  included  in  the  model  being  determined.  Then  the  determination  of 
the  model  structure  has  been  completed  [29]. 

Using  this  method,  with  data  gathered  during  experimental  runs,  several  soft-sensors  have  been  designed 
for  industrial  grinding  and  flotation  plants  in  an  off-line  [1,  3,  4,  18].  The  list  of  candidate  components 
included  single  measurements,  as  well  as  composite  components  combining  various  measurements  to 
produce  components  having  physical  significance.  In  order  to  determine  the  composite  components  a 
phenomenological  model  of  the  plant  has  been  used  for  finding  the  general  form  of  the  relation  between 
the  soft-sensor  variable  and  other  measurements  in  the  plant.  It  turned  out  that  the  majority  of  the 
components  selected  by  the  stepwise  regression  method  are  of  the  composite  type  -  i.e.,  having  some 
physical  significance  -  instead  of  the  single  measurement  linear  components.  In  the  case  of  the  particle  size 
soft-sensor  for  a  grinding  plant  [3]  the  result  is  given  by 


F%(t)=Go+Q,F%5(t-l)  +  Q2 


JB(t-V[s:(t)]+Q3JB(t-3)S:(t-3)+Q4 

Gsf 


[Gu(t)f 
(t  -  2)  Ps(t  -  3) 


2. 


where  the  autoregressed  soft-sensor  measurement  is 

F65  (t-i)  =  %  in  weight  over  the  65  mesh,  in  the  final  product  of  a  section  of  the  grinding  plant, 
and  the  secondary  measurements  are 

Js(t-i)  =  Total  power  of  the  three  ball  mills  in  the  grinding  section. 

SwF(t-i)  =  Average  solids  concentration  in  the  section  (%  by  weight)  in  the  hydrocyclones  feed,  considering 
the  three  grinding  circuits  of  the  section. 

Gu(t-i)  =  Rod  mill  water  feedrate;  Ps(t-i)  =  Sum  of  the  three  sump  pumps  speeds; 

Sw(t-i)  =  Solids  concentration,  %  by  weight; 

GL2(t-i)  =  Sump  water  addition  rate;  PS(t-i)  =  Sump  pump  speed;  GSF(t-i)  =  Fresh  ore  feedrate. 

All  variables  were  placed  in  the  candidate  component  list  including  various  delays,  along  with  the 
composite  component  candidates  including  their  delays.  The  result  was  that  all  selected  components 
depending  on  secondary  measurements,  were  composite  types.  The  performance  of  this  soft-sensor  was 
appreciably  better  than  the  model  based  on  only  single  measurement  components  in  the  candidate  list  [7]. 

Principal  Components 

In  the  Principal  Component  modeling  methods  [30]  an  n*m  sample  measurement  matrix  X  is  formed  in 
which  each  row  is  a  vector  xT  (j)  containing  the  m  measurements  performed  in  the  plant  at  sample  times  j  = 
1 ....  «..  The  m  eigenvalues  and  eigenvectors  p‘  of  the  symmetrical  correlation  matrix  XTX  are  found. 
Of  the  m  eigenvectors  a  subset  is  selected  that  explain  most  of  the  variation  of  the  data,  e.g.  by  choosing 
those  having  the  largest  eigenvalues.  This  subset  spans  the  principal  component  space  (PCS),  while  the 
remaining  eigenvectors  span  the  Residual  Space  (RS).  Since  XTX  is  symmetric,  all  eigenvalues  are 


63 


orthogonal  to  each  other,  and  if  they  are  normalized  to  Euclidean  length  1,  an  orthonormal  basis  for  the 
PCS  as  well  as  for  the  RS  space  is  obtained.  The  measurement  vectors  x(j)  are  projected  onto  the  PCS,  X 
being  the  projection.  The  components  of  vector  X  are  called  the  principal  components,  and  are  mutually 
orthogonal  (uncorrelated). 

In  the  Principal  Component  Analysis  method  a  missing  measurement  may  be  reconstructed  [6,  7,  8].  But 
more  than  one  measurement  may  be  reconstructed,  depending  on  the  dimension  s  of  the  PCS  and  the 
number  m  of  the  sensors.  Hence  several  soft-sensors  are  implicit  in  the  PCA  modeling  method  and  the  set 
of  secondary  measurements  for  any  of  them  depends  on  which  of  the  m  measurements  are  missing. 
Therefore  the  PCA  method  may  be  viewed  as  a  system  of  soft-sensor  models.  Here  the  model  structure  is 
contained  in  the  principal  component  space  and  is  determined  form  the  sample  correlation  matrix  XT X. 

In  Principal  Component  Regression  [24,  30]  a  regression  model  is  determined  by  regressing  the  soft- 
sensor  variable  on  the  principal  components  of  the  secondary  measurements.  Here  matrix  X  is  similar  to 
the  one  in  PCA,  but  contains  only  the  secondary  measurements  used  to  model  the  soft-sensor. 

Clustering 

The  advantages  of  dealing  with  a  simple  -.  parsimonious  -  model  may  sometimes  be  retained  by  means  of 
separating  the  operation  range  in  clusters,  each  of  which  has  good  simple  model.  Then  if  the  operating 
point  is  identified  as  belonging  to  a  given  cluster,  the  corresponding  model  is  used.  Such  a  procedure  has 
been  used  in  building  a  soft-sensor  for  concentrate  grade  measurement  in  a  rougher  flotation  plant  [21], 
Much  better  results  were  obtained  with  clustering  as  compared  with  the  case  of  a  single  model  determined 
for  the  complete  operating  range.  Clustering  has  also  been  used  in  a  soft-sensor  for  estimating  a  variable 
for  which  there  is  no  on-line  measurement  in  a  distillation  column  [16,  19].  The  models  are  based  on  radial 
basis  function  neural  networks  that  are  determined  for  each  of  the  clusters  into  which  the  secondary 
measurement  space  is  divided.  Then,  for  a  given  operating  point  -  given  by  a  set  of  secondary  variables  at 
a  given  sampling  instant  -  the  outputs  of  the  neural  networks  are  combined  using  fuzzy  logic,  according  to 
the  membership  grade  of  the  set  of  secondary  measurements  to  each  of  the  clusters.  Clustering  has  also 
been  used  for  a  soft-sensor  designed  to  model  an  event  and  the  proximity  to  the  occurrence  of  such  event 
[M6].  In  this  case  the  objective  was  to  give  early  warning  for  the  event  defined  by  the  overload  of  a  ball 
mill  in  an  industrial  grinding  plant.  C-means  clustering  was  used  to  group  operating  points  of  the  plant. 
Later  developments  led  to  retain  two  clusters  representing  two  different  types  of  overload.  The  distance  of 
the  operating  point  to  the  center  of  these  clusters  was  computed  on-line  and  the  distance  to  each  of  them 
was  an  indication  of  how  far  the  operation  was  from  reaching  one  of  the  two  overload  conditions.  Different 
corrective  actions  could  then  be  taken  according  to  the  distance  to  the  nearest  cluster. 

Kalman  Filter 

Soft-sensor  models  using  Kalman  Filters  have  been  used  when  the  phenomenological  knowledge  about 
the  plant  allows  a  convenient  state-state/output  model  to  be  stated.  Such  is  the  case  of  a  soft-sensor  for 
the  fiber  rate  in  a  sugar  cane  mill  [10],  A  soft-sensor  has  been  tested  using  data  from  a  pilot 
semiautogenous  grinding  mill,  with  aid  of  phenomenological  model  [20],  In  this  case  an  Extended 
(nonlinear)  Kalman  Filter  had  to  be  used,  since  a  linearized  state  model  had  a  very  small  region  of  validity 
(see  below).  The  estimated  variables  were  water,  ore  and  fine  ore  mill  contents.  In  addition  the  ginding 
rate,  water  and  ore  discharge  rates  were  estimated. 


PARAMETER  ESTIMATION  (TRAINING) 

For  a  given  model  structure,  the  model  parameters  (weights)  are  estimated  (training  in  the  case  of  neural 
networks)  for  an  operating  region,  normally  employing  gradient  techniques  to  minimize  a  function  of  an 
error  measuring  the  discrepancy  between  the  plant  measurement  and  the  estimated  measurement  obtained 
using  the  model.  A  model  structure  may  be  valid  for  a  large  region,  but  the  model  parameters  have  to  be 
updated  as  the  operating  point  moves  within  the  region.  The  result  being  sought  is  an  optimal  set  of  model 
parameters  (weights)  for  the  selected  model  structure.  The  scheme  of  Fig.  2  may  also  be  used  here, 
considering  now  that  the  model  structure  is  given  and  that  models  change  due  to  the  change  in  the 
parameter  (weight)  set.  Parameter  estimation  is  performed  on-line  or  off  line,  depending  on  the  case.  If  the 


64 


change  in  operating  point  -  within  a  specified  time  span  -  is  such  that  the  soft-sensor  model  parameters 
(weights)  undergo  important  changes  while  the  plant  is  in  normal  operation,  then  they  must  be  updated  on¬ 
line  using  available  on-line  measurements.  This  is  an  adaptive  process.  In  this  case  the  region  of  validity  of 
the  model  for  a  fixed  set  of  parameters  (i.e.,  the  region  in  which  the  error  criterion  function  is  less  than  a 
specified  value)  is  small.  This  situation  is  found  when  a  linear  model  is  used  in  the  case  of  an  appreciably 
non  linear-plant.  On  the  other  hand,  if  a  set  of  model  parameters  is  valid  through  most  of  the  normal 
operating  points  of  a  plant  within  the  specified  time  span,  parameter  estimation  may  be  performed  off-line. 
In  the  case  of  non-linear  plants,  this  requires  the  models  to  be  of  the  NARMAX  Class  (preferably  with 
components  having  physical  meaning),  of  the  Neural  Network  Class  or  of  the  Phenomenological  Class. 
According  to  the  availability  of  on-line  measurements,  two  basic  cases  may  be  distinguished: 

PEI  -  Parameter  estimation  (training)  using  on-line  measurements  at  each  sampling  time.  This  is  the  usual 
case  where  recursive  parameter  estimation  is  employed  but  batch  estimation  may  be  used  as  well. 

PE2  -  Parameter  estimation  (training)  using  on-line  or  off-line  measurements  (e.g.,  analysis  of  samples  in  a 
laboratory)  at  infrequent  (  relatively  long)  sampling  periods  for  the  variable  being  modeled. 

A  model  based  on  the  slow  sampling  rate  will  in  general  be  a  bad  model,  since  it  is  derived  using  samples 
that  do  not  represent  the  actual  evolution  of  the  involved  variable.  Instead,  a  model  must  built  using  a 
sampling  rate  that  is  sufficiently  fast  so  that  the  sampling  theorem  is  satisfied.  If  there  are  other  correlated 
secondaiy  measurements  that  are  sampled  at  an  appropriately  fast  rate  there  is  a  solution.  A  model  is  built 
based  on  the  fast  sampling  rate,  notwithstanding  which  its  parameters  (weights)  may  be  found  using  the 
infrequent  samples  provided  by  the  measurement  of  the  modeled  variable  as  well  as  the  fast  rate  samples  of 
the  correlated  (secondary)  measurements.  For  example,  an  instrumentation  system  for  measuring  grades  in 
a  flotation  plant  is  used  in  a  time  sharing  mode  to  assay  14  samples  piped  to  an  X-ray  analyzer.  If  the 
analyzer  takes  30  seconds  to  measure  each  mineral  sample,  the  sampling  rate  for  any  of  the  14  measured 
grades  is  7  minutes.  In  a  similar  case  there  is  a  laboratory  analysis  performed  infrequently  on  the  modeled 
variable  but  due  to  the  dynamics  of  the  process  a  faster  rate  is  required  for  good  manual  or  automatic 
control  [2, 9,  10,  1 1,  12,  22]  (see  Table  1  below). 

This  method  has  been  used  for  ARMAX  type  models  in  a  stirred  tank  fermenter,  and  in  a  distillation 
column  [9],  Applications  using  this  scheme  with  neural  network  soft-sensor  models  are  reported  in  [12], 
Another  application  of  this  method  is  found  in  composition  soft-sensors  in  a  distillation  column  using  as 
models  radial  basis  function  neural  networks  [19],  See  also  Table  1  below. 

Region  of  validity.-  Once  a  sensor  fails,  the  parameter  updating  must  cease,  since  it  is  no  longer  possible 
to  determine  the  error  between  the  soft-sensor  output  and  the  actual  measurement.  The  soft-sensor  signal 
then  begins  to  perform  its  job,  providing  a  virtual  measurement  of  the  missing  one,  using  the  secondary 
measurements  and  or  composite  components  as  inputs.  It  is  usual  to  freeze  the  model  parameter  (weight) 
set  at  some  value  determined  before  the  sensor  failure.  But  plant  characteristics  may  change  during  the 
period  after  the  sensor  failure,  e.g.  because  the  operation  point  moves  due  to  disturbances  or  control 
actions.  If  the  model  involves  a  simplification  of  reality,  as  it  is  often  the  case,  it  may  be  expected  that , 
except  in  very  simple  instances,  the  model  will  not  be  able  to  represent  the  part  of  the  actual  plant  to  be 
modeled  for  all  possible  operating  points  or  operating  trajectories.  This  is  clearly  the  situation  when  for  the 
sake  of  having  a  simple  model,  a  class  of  linear  models  is  selected  containing  the  candidates  to  represent 
a  nonlinear  plant  (or  sub-plant).  As  the  operating  point  changes  the  optimal  parameter  set  changes  so  that  a 
frequent  updating  of  the  model  parameters  may  be  required.  But  since  the  sensor  signal  is  no  longer 
available  this  updating  is  no  longer  possible,  and  the  soft-sensor  error  becomes  unacceptable  since  its 
region  of  validity  has  been  surpassed.  On  way  to  palliate  this  degradation  of  performance  is  to  increase  the 
region  of  validity  of  the  soft-sensor  model  by  incorporating  more  structure  into  it,  e.g.,  using  NARMAX  or 
neural  networks  models  having  physically  significant  components  or  inputs  (gray  models),  sufficiently 
complex  neural  networks,  and  models  incorporating  phenomenological  structure  [1,3,  10,  20]. 

Another  way  to  extend  the  validity  of  the  soft-sensor  model  after  it  becomes  unavailable  at  tf  is  by 
predicting  the  future  value  of  its  parameters  using  a  parameter  evolution  sub-model,  whenever  appropriate 
conditions  are  met  so  that  the  sub-model  parameters  may  be  estimated  while  the  soft-sensor  is  working. 
The  sub-model  is  then  used  to  give  an  optimal  prediction  of  the  future  value  of  the  soft-sensor  model 


65 


parameters  for  t>tf.  The  optimal  prediction  for  a  given  parameter  is  its  conditional  expectation  given  its 
value  prior  to  the  time  of  failure  tf .  [15].  As  an  example  a  first  order  sub-model  for  each  parameter  of  a 
soft-sensor  for  particle  size  measurement  in  an  industrial  grinding  plant  has  been  used.  The  prediction 
evolves  exponentially  from  the  initial  value  at  tf  to  its  unconditional  expected  value  for  t  »  tf  .  As 
should  be  expected  the  frozen  parameter  choice  gives  better  prediction  for  time  near  the  fault,  and  the 
unconditional  expected  value  parameters  give  a  better  prediction  fort »  tf  [15], 


CONTROL  ASPECTS 

Soft-sensors  in  control  loops 

Soft-sensors  are  used  mainly  in  the  manual  or  automatic  control  of  a  plant.  But  partial  modeling  involved 
in  soft-sensor  design  may  cause  unexpected  control  performance  when  soft-sensors  are  used  in  automatic 
control  loops  because  of  two  circumstances:  (a)  change  of  the  plant-controller  model  structure  introduced 
because  of  the  soft-sensor  model  and  (b)  parameter  coupling  between  the  unmodeled  and  modeled  part  of 
the  plant:  (i)  inherent  coupling  of  input/output  model  parameters  due  to  changes  in  parameters  of  the 
sate/state-output  plant  model,  and  (ii)  a  common  disturbance  causing  changes  of  parameters  both  of  the 
modeled  and  the  unmodeled  part  of  the  plant.  Among  the  components  selected  by  the  structure 
determination  method  there  usually  appear  measurements  that  are  not  necessarily  inputs  to  the  plant 
(controls  or  measured  disturbances),  but  also  other  measurements  of  intermediate  variables  that  are 
measured  by  sensors  which  are  close  to  the  real  sensor  to  be  replaced.  In  this  way,  only  a  part  of  the  plant 
is  modeled,  so  that  a  control  loop  using  the  soft-sensor  contains  a  modeled  and  an  unmodeled  part  of  the 
pant.  Coupling  between  the  modeled  and  the  unmodeled  part  of  the  plant  may  lead  to  off  specification 
control  performance  -  even  instability  -  unless  the  fact  that  a  soft-sensor  is  eventually  going  to  replace  the 
actual  sensor  is  considered  in  the  design  of  the  control  loop  [13,  14], 

Differences  between  soft-sensor  and  control  models 

In  order  for  a  control  model  to  be  useful  in  control,  it  is  required  to  predict  the  future  values  of  the 
controlled  variable  within  a  relatively  short  time  interval,  e.g.  several  setting  times  in  General  Predictive 
Control  (GPC)  strategies  [31],  or  in  shorter  periods  for  other  control  strategies,  e.g.,  adaptive 
PID+Feedforward  control.  Then  the  time  span  used  to  determine  the  model  needs  not  to  be  very  large.  On 
the  other  hand,  soft-sensors  are  required  to  predict  the  absent  sensor  signal  for  periods  that  may  be 
relatively  long,  until  the  real  sensor  signal  is  again  available.  Therefore,  the  time  span  used  for  the 
determination  of  soft-sensor  models  may  be  much  larger  than  in  the  control  model  case.  In  this  way 
different  models  may  be  obtained.  For  example,  a  slowly  varying  measured  disturbance  may  be  considered 
constant  during  the  time  span  of  a  control  model,  so  that  it  becomes  part  of  the  constant  component  of  the 
model.  But  in  the  case  of  the  larger  time  span  needed  for  the  soft-sensor  model,  it  may  well  happen  that  the 
disturbance  changes,  and  cannot  be  considered  to  be  constant.  Then  it  is  liable  to  become  included  in  the 
model ,  while  it  does  not  become  included  in  the  control  model  [23], 


INDUSTRIAL  ASPECTS 

There  is  clearly  a  question  of  economics  in  deciding  whether  the  back-up  for  a  sensor  is  to  a  spare  sensor  or 
a  soft-sensor.  Justification  for  a  soft-sensor  depends,  on  the  one  hand,  on  the  relative  cost  of  a  real  back-up 
sensor.  For  example,  as  a  back-up  for  a  wattmeter  a  spare  wattmeter  may  be  best,  while  for  backing-up 
other  complex  sensors,  a  soft-sensor  may  be  a  best  choice.  Such  would  be  the  case  in  the  replacement  of  a 
particle  size  analyzer  used  in  the  minerals  industry.  On  the  other  hand,  a  wattmeter  may  be  directly 
replaced  by  its  spare,  while  other  in  other  cases  -  e.g.  that  of  the  particle  size  analyzer  -  an  undue  period  of 
time  and  effort  may  be  required  and,  in  the  worst  cases,  even  a  shut  down  of  a  unit  could  be  necessary. 

From  an  industrial  point  of  view  a  soft-sensor  must  be  reliable  and  robust,  otherwise  its  main  purpose,  i.e., 
of  serving  as  an  alternative  for  obtaining  a  measurement  when  a  real  sensor  is  not  available,  is  not  fulfilled. 
Therefore  in  an  industrial  application  the  soft-sensor  models  must  be  imbedded  in  systems  that  ensure 
robustness  of  the  soft  sensor  performance.  Before  the  measurements  are  used,  signal  conditioning 
(scaling,  filtering) ,  data  reconciliation  including  elimination  of  outliers,  etc.  [16,  17,  32,  33].  Also,  as  in  [3] 


66 


it  may  be  convenient  to  have  several  soft-sensor  models  for  the  same  soft-sensor  signal,  in  case  one  of  the 
secondary  measurements  also  fails.  While  the  actual  sensor  is  available,  the  error  between  the  sensor  and 
soft-sensor  signals  should  be  monitored.  If  some  index  of  this  error  exceeds  a  certain  threshold,  a  system 
must  decide  whether  the  cause  is  the  sensor  is  faulty  or  the  plant  has  changed  its  characteristics  [6,  7,  8,  32, 
33].  If  the  cause  is  the  changing  of  plant  characteristics,  e.g.,  because  the  operating  point  has  exceeded  the 
region  of  validity  of  the  current  set  of  model  parameters,  then  the  plant  must  undergo  some  excitation  of 
its  inputs  in  order  to  estimate  the  new  optimal  parameter  set.  Having  the  plant  under  excitation  conditions 
at  all  times  would  be  frowned  upon  by  the  operators,  so  that  this  action  should  be  resorted  to  only  when  it 
is  indispensable. 


SUMMARY  OF  RESEARCH  AND  APPLICATIONS 

Soft-sensors  are  being  increasingly  applied  in  the  process  industries  because  their  ability  to  solve  the 
problem  of  the  unavailability  of  sensors  for  any  of  the  reasons  already  considered.  Paralleling  these 
growing  applications,  there  is  a  quite  a  number  of  research  results  and  ongoing  research  leading  to  the 
design  of  soft-sensors  themselves,  as  well  as  to  fault  detection  and  identification.  Table  1  contains  a 
summary  of  a  sample  of  the  published  literature  concerning  soft-sensors.  Acronym  used  in  table  1  are: 

NN  =  Neural  Network;  RBF  =  Radial  Basis  Function;  PCR  =  Principal  Component  Regression;  PCA  = 
Principal  Component  Analysis;  ARMAX  =  Autoregressive  Moving  Average  with  eXogenuos  input; 
NARMAX  =  Nonlinear  ARMAX; 


Table  1.  Summary  of  Relevant  Literature  on  Soft  Sensors. 


Ref.. 

Model  Class 

Main  features 

Applications 

■ 

NARMAX 

Stepwise  regression. 

Comparison  between  gray  and  black  box  (linear)  soft 
sensor  models.  Effects  due  to  improper  sampling  period. 

Tests  in  industrial  grinding  plant  density  and 
particle  size.  Sampling  period  effects  tested  in 
simulated  simple  plant. 

2 

Neural 

Network 

Infrequent  sampling.  System  implementation  basics. 
General  modeling  approach 

Naphtha  end  boiling  point  for  a  residual 
fluidic  cracking  unit 

3 

NARMAX 

Stepwise  Regression.  Robustness  to  secondary 
measurements  failures.  Grey  models  using 
phenomenological  components 

Tests  in  particle  size  distribution  in  industrial 
grinding  plant 

4 

NARMAX 

Stepwise  regression.  Grey  models.  Availability  Index 
increased  through  a  system  of  soft-sensors 

Tests  in  pulp  density  in  industrial  grinding 
plant 

5 

Neural 

Network 

RBF  properties.  RBF  polynomial  expansions.  Correlated 
noise.  Validity  tests. 

Liquid  level.  Diesel  engine  speed 

6 

PCA 

Missing  measurement  reconstruction  (soft-sensing)  for 
sensors  within  a  set  of  sensors.  Types  of  faults.  Fault 
detection  .  Identification  of  failed  sensor  through  s 
sensor  validity  index.  Effect  of  filtering  .  Separating 
sensor  fault  detection  from  plant  changes. 

Data  from  boiler  process  with  a  9-sensor  set. 

7 

PCA 

Geometric  approach  of  subjects  in  FD3  and  FD6 
provides  intuitive,  although  rigorous,  point  of  view. 

8 

PCA 

Abridged  version  of  [81 

9 

ARMAX 

Infrequent  sampling 

Adaptation 

Biomass  concentration,  distillation  tower  top 
product  composition,  melt  flow  index  in 
polymerization  reactor 

10 

Kalman  Filter 

Infrequent  sampling.  Use  of  phenomenological 
knowledge 

Fiber  rate  in  sugar  cane  mill. 

11 

Neural 

Network 

Infrequent  (4  day)  sampling.  From  ARMAX  and 
NARMAX  to  Radial  basis  functions  NN  models. 
Determination  of  network  topology  and  placement  of 

RBF  centers. 

Biomass  estimation  in  industrial  process 

12 

Neural 

Network 

Increasing  use  of  soft-sensors  in  industrial  applications 
reported. 

Distillation  column  kerosene  flash  point  and 
distillate  flash  point.  Polymer  melt  index  in 
industrial  plant.  Cardboard  strength  and 
porosity  test  for  a  liner  board  machine.  pH 
for  neutralization  circuit  in  gold  extraction. 

13 

Effect  of  plant  parameter  changes  when  the  control  loop 
includes  a  soft  sensor 

14 

Design  problem  when  soft-sensor  replaces  sensor  in 
control  loop 

15 

ARMAX 

Comparison  of  soft-sensors  with  frozen,  optimal  and 
average  parameters 

Density  in  industrial  grinding  plant 

67 


Ref. 

Model 

Main  features 

Applications 

16 

Neural 

Network 

Clustering.  Infrequent  sampling.  System  design  features. 
Sensor  selection  by  Singular  Value  Decomposition. 

Compositions  in  distillation  column  top  and 
bottom 

17 

Expert  System 

System  implementation  details 

Cooking  time  and  kappa  number  in  batch 

pulping  process  in  pulp  mill. 

18 

NARMAX 

Stepwise  regression.  Grey  model. 

Test  for  solids  concentration  in  hydrocyclone 
overflow  in  industrial  grinding  plant 

19 

Neural 

Network 

Radial  basis  NN.  Infrequent  sampling.  Clustering,  with 
fuzzy  combination  of  outputs  of  cluster  models. 

Propylene  and  propone  composition  in 
distillation  column 

20 

Kalman  Filter 

Extended  Kalman  Filter.  Nonlinear  state  model  based  on 
phenomenological  knowledge.  Improvement  over  linear 
model  using  linear  Kalman  Filter. 

Pilot  SAG  mill  with  aid  of  phenomenological 
model.  Water,  ore  and  fine  ore  content. 
Estimated  parameters:  ginding  rate,  water  and 
ore  discharge  rates. 

21 

ARMAX 

Stepwise  regression.  Comparison  of  soft-sensors  with 
and  without  clustering.  Soft  sensor  based  exclusively  on 
clustering 

Tests  in  industrial  rougher  flotation  plant 
grades.  Tests  of  mill  overload  in  industrial 
grinding  plant. 

22 

NN 

Infrequent  sampling.  Recurrent  NN. 

Simulation  tests  in  simple  plant 

23 

NARMAX 

Stepwise  Regression.  Grey  models 

Comparison  between  soft-sensor  and  control  models 

Tests  in  industrial  rougher  flotation  plant 
concentrate  grade 

25 

PCA,  PCR, 

NN 

Fault  detection  and  identification  of  the  set  of  sensors 
using  sensor  validity  index.  Actual  and  reconstructed 
measurements  used  as  input  to  model. 

Test  in  gas  emission  monitoring  of  industrial 
boiler 

26 

Neural 

Network 

Recurrent  Neural  Networks 

Influence  of  noise  on  model  and  training  algorithms 

Tests  using  simulated  abstract  nonlinear  plant 

27 

Neural 

Network 

Identity  NN  used  for  estimating  faulty  measurements 
followed  by  soft-sensor  model.  Comparisons  in  terms  of 
missing  data 

Recovery  of  light  component  in  bottom  of 
distillation  column. 

CONCLUSIONS 

Soft-sensors  are  being  increasingly  applied  in  process  industries.  At  the  same  time  there  is  an  appreciable 
deal  of  research  and  development  in  the  modeling  aspect  and  in  the  associated  subject  of  fault  detection 
and  identification,  sometimes  with  tests  performed  in  pilot  plants,  and  with  tests  and  permanent 
applications  in  industrial  plants.  All  classes  of  models  are  being  used:  regression  and  multivariable 
correlation  models,  neural  network  models,  fuzzy  logic  models,  and  combinations  between  them, 
including  the  incorporation  of  phenomenological  knowledge. 

Concerning  modeling,  a  unified  point  of  view  to  consider  under  a  common  point  of  view  several  aspects 
concerning  modeling  with  regression  or  correlation  models,  neural  networks,  etc.  ,  appears  to  be  lacking. 
One  consequence  of  this  is  that  different  names  are  given  to  the  same  concepts,  e.g.:  parameters  =  weights, 
parameter  estimation  =  training,  prediction  ability  =  generalization,  etc.  Even  sometimes  confusion  arises 
because  variables  are  give  the  name  of  ‘parameters”  ,  most  probably  because  parameters  are  already 
called  weights  in  the  neural  network  literature.  In  addition,  some  results  which  have  already  common 
knowledge  in  the  case  of  ARMAX  and  NARMAX  type  models  may  be  extended  to  the  case  of  neural 
networks,  e.g.,  in  the  case  of  the  dependence  of  training  and  parameter  estimation  algorithms  on  whether 
the  disturbances  are  white  or  colored  noise  [5,  26],  It  would  be  useful  to  develop  the  unified  approach 
initially  attempted  in  this  paper  . 

Although  the  published  literature  shows  quite  a  deal  of  effort  devoted  to  soft-sensor  modeling  and  to  fault 
detection  and  identification,  not  much  is  being  done  in  the  area  of  the  effects  produced  by  the 
substitution  of  a  sensor  by  a  soft  -sensor.  The  fact  that  the  soft-sensor  model  may  result  in  a  partial  model 
of  the  plant  using  intermediate  plant  measurements,  introduces  problems  in  the  overall  control  loop,  which 
should  be  further  investigated. 

Good  results  obtained  by  introducing  phenomenological  components  as  inputs  to  regression  models, 
suggest  research  in  using  a  similar  approach  in  other  modeling  methods,  such  as  neural  networks  and 
principal  component  analysis  models. 


68 


AKNOWLEDGEMENTS 

The  author  is  most  grateful  for  the  contribution  of  Aldo  Casali,  Gianna  Vallebuona  (Mining  Eng.  Dept.,  U. 
of  Chile)  and  Ricardo  Barrera  (Electrical  Eng.  Dept.,  U.  of  Chile)  for  his  work  in  soft-sensors. 


REFFERENCES 

1.  Gonzalez,  G.D.,  Odgers,  R.,  Barrera,  R.,  Casali,  A.,  Torres,  F.,  Castelli,  L.,  Gimenez,  P  (1995)  Soft- 
sensor  design  considering  composite  measurements  and  the  effect  of  sampling  periods.  Proc.  Copper 
95,  International  Conference,  Santiago,  Chile,  II,  213  -  224. 

2.  J.C.  Wang  and  S.Q.  Wang.  Neural  soft-sensor  for  the  RFCCU's  fractionator  naphtha  endpoint,  1996. 
INCONIP  '96,  Hong-Kong,  2,  1 164-1 168. 

3.  A.  Casali,  G.  Gonzalez,  F.  Torres,  G.  Vallebuona,  L.  Castelli,  P.  Gimenez,  1998.  ‘Particle  size 

distribution  soft-sensor  for  a  grinding  circuit”.  Powder  Technology  ,Vol.  99,  pp  1 5  -  20. 

4.  Casali,  A.,  Gonzalez,  G.,  Torres  F.,  Cerda  I.,  Castelli  L.,  and  Gimenez  P.,1995.  Pulp  density  soft-sensor 

for  a  grinding  circuit",  Proc.  XXV  APCOM,  Brisbane,  Australia,  371-376. 

5.  S.  Chen,  S.A.  Billings,  1990.  Practical  identification  of  NARMAX  models  using  radial  basis  functions. 
Int.  J.  Control,  52(6),  1327-1350. 

6.  R.  Dunia,  S.J.  Qin,  T.F.  Edgar,  T.J.  McAvoy,1996.  Identification  of  faulty  sensor  using  principal 

component  analysis.  AIChE  Journal  42(10),  2797-2812. 

7.  R.  Dunia  and  S.J.  Qin,  1998.  A  unified  geometric  approach  to  process  and  sensor  fault  identification 

and  reconstruction:  the  unidimensional  fault  case.  Computers  Chem.  Eng.  22  (7-8),  927  -  943. 

8.  R.  Dunia,  S.J.  Qin,  T.F.  Edgar,  and  T.J.  McAvoy,  1996.  Sensor  fault  identification  and  reconstruction 

using  principal  components.  Proc.  13th  Triennial  IFAC  World  Congress,  San  Francisco,  N,  259-264. 

9.  M.T.  Tham,  G.A.  Montague,  A.J.  Morris,  and  P.A.  Lant,  1991 .  Soft-sensors  for  process  estimation  and 
inferential  control,  J.  Proc.  Control,  1,  3-14. 

10.  S.  Crisafulli,  R.D.  Pierce,  G.A.  Dumont,  M.S.  Ingegneri,  J.E.  Seldon,  and  C.B.  Baade,  1996. 

Estimating  sugar  cane  fibber  rate  using  Kalman  filtering  techniques.  Proc.  1 3*  IFAC  Triennial  World 
Congress,  San  Francisco,  USA. ,  361-366. 

1 1.  A.G.  Hofland,  A.J.  Morris  and  G.A.  Montague,  1992.  Radial  basis  function  networks  applied  to  process 
control.  Proc.  American  Control  Conf.,  1, 480-483. 

12.  G.  Martin,  1997.  Soft-sensors  get  the  process  data  you  really  want.  Control  &  Instrument.,  K ,  Jan.,  45 

13.  G:D:  Gonzalez,  R.  Odgers.  Issues  in  the  design  of  control  loops  using  soft-sensors,  1996.  Proc.  13th 
IFAC  Triennial  World  Congress,  San  Francisco,  USA.  A,  499-504. 

14.  G.  Gonzalez,  M.  Gonzalez  P.,  J.C.  Cartes.,  Control  problems  due  to  replacement  of  sensors  by  soft  - 
sensors.  Advances  in  Instrumentation  and  Control,  Proc.  ISA/92  International  Conference  and 
Exhibition,  Flouston,  USA.,  Oct.  1992,  v.  47,  part  2, 1 193-1200. 

15.  G.D.  Gonzalez,  M.A.  Aguilera  and  L.  Castelli,  1993.  Development  of  a  density  soft-sensor  for  a 
mineral  grinding  plan  Prepr.  12th  IFAC  World  Congress,  Sydney,  Australia,  5,  355-358. 

16.  Fuzzy  neural  nets  based  soft  sensor  and  its  applications,  1994.  Int.  Conf.  on  Data  and  Knowledge 
Systems  for  Manufacturing  and  Engineering,  2,  503-508. 

17.  M.  Rao,  J.  Corbin  and  Q.  Wang,  1993.  Soft  sensor  for  quality  prediction  in  batch  chemical  pulping 
process.  Proc.  IEEE  Int.  Symp.  on  Intelligent  Control,  150-155. 

18.  A.  Casali,  G.  Vallebuona,  M.  Bustos,  G.  Gonzalez,  P.  Gimenez,  1998.  A  soft-sensor  for  solids 
concentration  in  hydrocyclones.  Minerals  Engineering  1 1(4),  375-383. 

19.  X.  Wang,  R.  Luo  and  H.  Shao,  1996.  Designing  a  soft  sensor  for  a  distillation  column  with  the  fuzzy 
distributed  radial  basis  function  neural  network.  Proc.  IEEE  35th  Conf.  on  Decision  and  Control,  Kobe, 
Japan,  1714-1719. 

20.  R.  Amestica,  G.  Gonzalez,  J.  Menacho  and  J.  Barria ,  1993.  On-line  estimation  of  fine  and  coarse  ore, 
water,  grinding  rate  and  discharge  rates  in  semiautogenous  grinding  mills.  Proc.  XVIII  International 
Mineral  Processing  Congress,  Sydney,  Australia,  1,  109-1 15. 

21.  Espinoza,  P.A.,  G.  D.  Gonzalez,  A.  Casali,  C.  Ardiles,  1995.  Design  of  soft-sensors  using  cluster 
techniques.  Proc.  International  Mineral  Processing  Congress,  Oct.  22-27,  San  Francisco,  1,  261-265 

22.  Y.  Yang  and  T.  Chai,  1997.  Soft  sensing  based  on  artificial  neural  networks,  Proc.  American  Control 
Conference,  Albuquerque,  New  Mexico,  1,  674-678. 

23.  G.D.  Gonzalez  and  J.P.  Redard,  1994.  Adaptive  models  for  soft-sensors  and  control  for  a  rougher 


69 


flotation  plant,  1994.  Proc.  of  ISA  '94  Conference  and  Exhibit,  Anaheim,  CA,  Oct.  3,  1 143-1152. 

24.  R.H.  Myers,  1990.  Classical  and  modem  regression  with  applications,  2nd  Edition,  Duxbury  Press, 
Belmont,  California. 

25.  S.J.  Qin,  H  Yue  and  R.  Dunia,  1997.  A  Self-validating  inferential  soft-sensor  for  emission  monitoring. 
Proc.  American  Control  Conf.,  Albuquerque,  NM,  473-477. 

26.  O.  Nerrand,  P.  Roussel-Ragot,  D.  Urbani,  L.  Personnaz  and  G.  Dreyfus.  Training  recurrent  neural 
networks:  Why  and  how?  An  illustration  of  dynamic  process  modeling,  1994.  IEEE  Trans.  On  Neural 
Networks,  5(2),  178-184. 

27.  K.  Meert,  1996.  A  real  -  time  recurrent  learning  network  structure  for  dealing  with  missing  sensor 
data.  IEEE,  1600-1605. 

28.  L.  Ljung,  1987.  System  Identification:  Theory  for  the  User,  P.T.R.  Prentice  Hall,  Information  and 
System  Sciences  Series,  New  Jersey. 

29.  R.  Harber  and  H.  Unbehauen,  1990.  Structure  identification  of  nonlinear  dynamic  systems  -A  survey 
on  input-output  approaches,  Automatica,  26(4),  651-677. 

30.  J.E.  Jackson,  1991 .  A  user’s  guide  to  principal  components,  John  Wiley,  New  York. 

31.  R.  Bitmead,  M.  Gevers  and  V  Wertz,  1990.  Adaptive  Optimal  Control:  The  Thinking  Man’s  GPC, 
Prentice  Hall  International  Series  in  Systems  and  Control  Engineering,  New  Jersey. 

32.  R.  Barrera,  G.  Gonzalez,  A.  Casali,  G.  Vallebuona,  1996.  SENVIR:  A  soft-sensor  system  for 
industrial  applications  (In  Spanish).  Project  FONDEF  MI-17,  Dept,  of  Elect.  Eng.,  University  of  Chile. 

33.  R.  Issermann,  1994.  Integration  of  fault  detection  and  diagnosis  methods,  Prepr.  IFAC  Symposium  on 
Fault  Detection,  Supervision  and  Safety  for  Technical  Processes:  SAFEPROCESS,  2,  597-612. 


70 


71 


J.  Keith  Brimacombe  Memorial  Symposium: 
Intelligence  in  Materials  Process  Engineering 


73 


In  Memory  of  J.  Keith  Brimacombe: 

The  Pursuit  of  Quality  in  the  Casting  of  Materials 

I.V.  Samarasekera 


The  Centre  for  Metallurgical  Process  Engineering 
Advanced  Materials  and  Process  Engineering  Laboratory 
2355  East  Mall,  University  of  British  Columbia.  Vancouver,  B.C.  Canada 

Email:  indira@,cmpe.ubc.ca 


There  are  few  of  us  who  have  impacted  the  field  of  metallurgical  process  engineering  in  the  2Cfh  Century  to 
the  extent  of  the  late  Keith  Brimacombe.  In  a  remarkable  career,  that  spanned  three  decades,  he  changed 
the  field  irrevocably.  Dr.  Keith  Brimacombe  applied  the  power  of  combining  mathematical  models, 
sophisticated  laboratory  measurements  and  ingenious  plant  trials  to  analyse  complex  processes  in  both  the 
non-ferrous  and  steel  industries.  Of  his  many  contributions  to  the  discipline,  his  work  in  the  pursuit  of 
quality  in  commercial  casting  processes  stands  out.  It  transformed  our  understanding  of  many  processes 
and  illustrated  the  importance  of  an  interdisciplinary  approach.  Keith  Brimacombe  was  passionate  about 
the  need  to  break  down  walls  that  exist  between  industries,  between  universities  and  industry,  and  between 
the  scientist  or  engineer  in  the  laboratory  and  the  shop  floor  worker  in  whose  hands  lies  the  challenge  of 
transforming  knowledge  into  wealth. 

This  paper  traces  Dr.  Brimacombe  1;  contribution  to  the  continuous  casting  of  steel,  the  D.C.  casting  of  zinc 
and  aluminum,  the  static  casting  of  fused  cast  refractories,  and  ingot  casting.  The  paper  also  seeks  to 
illustrate  his  philosophy  of  cross-fertilization  of  knowledge  between  industries,  and  of  the  power  to  apply 
lessons  learned  from  one  field  to  solve  real-life  problems  in  another.  Through  the  use  of  mathematical 
models  and  judicious  in-plant  measurements,  Keith  Brimacombe  and  his  students  elucidated  the  formation 
mechanism  of  numerous  defects  in  these  casting  processes  and  prescribed  measures  to  eliminate  them.  Not 
content  to  simply  publish  his  findings  in  journals,  he  went  further  by  spearheading  measures  to  transfer 
technology  to  the  shop  floor,  through  short  courses,  through  in-plant  trials  which  implemented  his  ideas, 
and  through  the  development  of  expert  systems.  It  was  his  dream  to  build  an  "intelligent  continuous  casting 
process"  equipped  with  sensors  and  a  "smart"  system  to  monitor  the  formation  of  defects  and  prescribe 
corrective  measures  on  line.  These  ideas  are  being  pursued  vigorously  world  wide  and  it  is  up  to  those  of 
us  who  follow  in  his  foot-steps  to  make  them  realities. 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


74 


75 


Towards  Intelligent  Steel  Processing 

Rian  Dippenaar 

BHP  Institute  for  Steel  Processing  and  Products,  University  of  Wollongong,  Northfields 
Avenue,  Wollongong,  NSW  2522,  Australia 


ABSTRACT 

In  recent  times  there  has  been  a  rapid  increase  in  the  use  of  dynamic  models  and  artificial  intelligence  to 
control  steelmaking  processes.  These  developments  have  created  a  heightened  awareness  of  the  importance 
of  developing  diagnostic  sensors  to  provide  input  into,  or  to  validate  process  models.  Of  fundamental 
importance  in  various  steelmaking  processing  steps  is  the  control  of  the  oxygen  potential.  Reliable  and 
accurate  sensing  of  the  oxygen  potential  is  a  mandatory  requirement  if  intelligent  processing  techniques  are 
to  be  used.  The  late  Keith  Brimacombe  pioneered  the  validation  of  many  mathematical  models  and  sensing 
devices  designed  for  metallurgical  process  control  by  experimental  measurement  in  the  laboratory  and  on 
production  plants.  I  have  elected  to  honour  him  by  emphasising  the  importance  of  the  accurate 
measurement  of  the  oxygen  potential  in  liquid  steel.  The  significance  and  experimental  measurement  of 
electronic  conduction  in  electrochemical  sensors  will  be  highlighted,  with  special  reference  to  the  small 
zirconia  sensors  used  in  commercial  practice. 

INTRODUCTION 

The  late  Keith  Brimacombe  passionately  pursued  the  development  of  quantified  links  between  process  and 
product.  He  argued  that  measurements  of  processes  give  us  understanding  and  knowledge,  while 
mathematical  models  provide  the  framework  to  assemble  the  knowledge  and  to  apply  it  quantitatively  to 
link  process  behaviour  and  product  properties.  His  refrain  that  validation  of  mathematical  and  computer 
models  by  experimental  measurement  is  at  the  core  of  the  success  of  intelligent  processing,  will  always 
remain  with  me.  Hence,  I  have  elected  to  honour  him  in  this  special  session  of  the  Conference  by 
concentrating  on  validation.  In  this  instance  not  on  the  validation  of  a  mathematical  model  as  such,  but  on 
the  validation  of  the  measurement  of  the  oxygen  potential  in  liquid  iron  and  steel.  This  measurement  and 
the  validation  thereof  is  important,  especially  because  many  decisions  made  in  the  course  of  the  various 
steelmaking  processes  hinge  on  a  proper  knowledge  of  the  oxygen  potential  in  the  steel  melt  which,  in  turn, 
is  pivotal  information  required  for  the  effective  use  of  mathematical  process  models  employed  as  elements 
of  the  larger  artificial  intelligent  systems. 

Following  the  revolution  in  the  engineering  world  brought  about  by  Henry  Bessemer's  invention  of  a 
pneumatic  steelmaking  process  in  1856,  great  strides  have  been  made  in  process  efficiency  and  product 
quality.  Modem  oxygen  steelmaking  processes  are  capable  of  producing  1 000  tons  of  steel  per  hour  and  by 
the  use  of  secondary  processing  techniques  the  impurity  content  can  be  reduced  to  less  than  50  ppm. 
Suspension  bridge  cables  with  a  tensile  strength  of  1.8  GPa  have  been  used  while  the  tensile  strength  of 
steel  tyre  chord  is  in  excess  of  3.5  GPa  and  steel  sheet  is  drawn  down  to  thicknesses  of  less  than  0.1mm  for 
the  large  scale  production  of  beverage  cans.  These  major  achievements  have  been  brought  about  by  the 
integration  of  knowledge  rooted  in  thermodynamics,  reaction  kinetics,  fluid  dynamics  and  a  multitude  of 
other  disciplines. 

Recently,  great  strides  have  been  made  towards  the  development  of  intelligent  processing  techniques, 
leading  to  yet  further  improvements  in  processing  efficiency  and  product  quality.  The  availability  of  high 
speed  personal  computers  has  led  to  the  rapid  development  of  computerised  mathematical  models  with  an 
emphasis  on  describing  rigorously  either  individual  process  steps  or  integrating  the  process-product 
interrelationships.  These  models  incorporate  the  basics  of  conservation  and  the  laws  of  physics  and 
chemistry  mathematically  and,  once  properly  formulated  and  validated,  predict  product  properties  reliably 
and  quantitatively  as  a  function  of  process  conditions  [1].  In  a  related  but  different  kind  of  approach  we 
have  seen  the  development  of  dynamic  process  models  which  combine  heuristic  and  current  process  data 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


76 


with  fundamental  knowledge  to  describe  individual  process  steps  and  then  employing  artificial  intelligence 
to  integrate  all  of  this  into  a  unified  whole. 

This  drive  towards  quantitative  analysis  and  control  of  steelmaking  processes  has  created  a  heightened 
awareness  of  the  importance  of  developing  diagnostic  sensors  to  provide  input  into  or  to  validate  process 
models.  Of  fundamental  importance  in  various  steelmaking  processing  steps  is  the  control  of  the  oxygen 
potential.  Not  only  does  the  oxygen  potential  determine  the  behaviour  of  various  alloying  elements,  but  it 
also  impacts  directly  on  the  carbon,  phosphorous  sulphur,  nitrogen  and  hydrogen  contents  of  the  steel 
product.  It  is  therefore  not  surprising  that  considerable  attention  has  been  dedicated  to  the  measurement  of 
the  oxygen  potential  in  liquid  steel,  to  the  extent  that  instantaneous  oxygen  sensing  has  become  an 
indispensable  tool  for  process  control.  Moreover,  in  pursuit  of  modernisation  and  automation  of 
continuous  casting  and  ladle  processing  operations,  it  is  not  only  the  instantaneous  measurement  that  is 
important  but  also  the  on  line  monitoring  and  control  of  the  oxygen  potential.  Consequently,  much  effort 
had  been  expended  to  increase  the  lifetime  and  reliability  of  oxygen  sensors.  Reliable  and  accurate  sensing 
of  the  oxygen  potential  is  a  mandatory  requirement  if  intelligent  processing  techniques  are  to  be  used  for 
the  control  of  modem  and  advanced  steelmaking  processes.  It  is  therefore  of  the  utmost  importance  to 
assess  the  validity  and  accuracy  of  such  oxygen  activity  measurements  and  the  extent  to  which  this 
information  can  be  used  in  our  quest  to  quantify  and  automate  steelmaking  processes. 

THE  PRACTICAL  USE  OF  OXYGEN  SENSORS 

Steelmaking  is  an  oxidation  process  in  which  the  oxidisable  elements  in  the  melt  are  selectively  transferred 
to  the  slag  whilst  decarburisation  occurs  at  the  same  time.  The  transfer  of  silicon  from  the  metal  to  the  slag 
may  be  expressed  as: 


[Si]+2[0]=(S'iO2)  l. 

Where  [x]  signifies  an  element  in  infinite  dilute  solution  and  (y)  a  compound  in  the  slag, 
or  by  the  thermodynamically  equivalent  expression: 

[Si}+{FeO)  =  {Si02)+Fe,  2. 

and  similar  equations  apply  to  the  oxidation  of  other  elements. 

From  a  fundamental  point  of  view,  the  progress  of  such  multi-component  slag-metal  equations  can  be 
described  by  incorporating  the  thermodynamics  and  process  dynamics  of  the  system.  For  example,  Oeters 
and  Xie  [2]  have  shown  that  in  a  reaction  such  as  Reaction  2,  the  thermodynamic  equilibrium  needs  to  be 
established,  four  flux  equations  need  to  be  developed  and  three  continuity  equations  need  to  be  complied 
with.  As  the  equalibrium  constant  is  a  function  of  the  slag  composition,  this  dependence  also  needs  to  be 
known  before  the  kinetic  analysis  can  be  done.  When  the  flux  equations  are  integrated  over  time,  an 
equation  system  for  each  component  is  obtained  which  yields  the  concentration  versus  time  curves  for  each 
component  in  the  bulk  phases. 

The  silicon  oxidation  reaction  is  only  one  of  many  oxidation  reactions  occurring  simultaneously  in  the  steel 
bath  and  each  one  of  these  reactions  has  to  be  analysed  in  this  fashion.  However,  a  common  denominator  in 
all  these  reactions  is  the  oxygen  potential  in  the  liquid  pool  as  indicated  in  Equation  1 ,  so  that  a  knowledge 
of  the  oxygen  potential  of  the  liquid  metal  is  a  prerequisite  for  a  proper  understanding  of  the  progress  of 
these  reactions  and  hence,  for  process  control.  For  this  reason  oxygen  sensors  have  been  developed  which 
can  measure  the  oxygen  potential  of  the  steel  bath  instantaneously.  A  vast  amount  of  work  has  been  done 
on  the  development  and  application  of  such  oxygen  sensors  and  many  literature  reviews  have  been 
compiled.  For  example  McLean  [3],  Iwase  [4]  and  Janke  [5],  No  attempt  will  be  made  to  review  the 
literature,  the  vast  portfolio  of  techniques  employed,  or  the  underpinning  theory,  save  to  put  the  current 
discussion  into  context. 


77 


The  use  of  oxygen  sensors  employing  stabilised  zirconia  solid  electrolytes  to  measure  the  oxygen  activity 
in  molten  steel  in  the  various  stages  of  steelmaking,  has  become  standard  practice.  The  tube  design,  shown 
in  Figure  la  is  typically  used  for  instantaneous  measurements  while  plug-type  designs,  such  as  shown  in 
Figure  lb  are  employed  for  long  term  measurements  in  an  attempt  to  minimise  polarisation.  Tu  and  Janke 
[6]  used  an  even  more  advanced  sensor  to  monitor  the  oxygen  potential  continuously  during  vacuum 
treatment  of  liquid  steel  for  periods  of  longer  than  three  hours.  More  recently,  oxygen  concentration  cells 
have  also  been  used  to  measure  the  activity  of  components  other  than  oxygen  in  molten  steel.  For  example 
silicon,  chromium,  titanium,  phosphorous  [4,  5,  7],  Such  measurements  have  been  made  possible  by  the 
incorporation  of  an  auxiliary  electrode  in  the  sensor.  An  example  of  such  an  auxiliary  electrode  is  shown  in 
Figure  lc,  the  details  of  which  will  be  discussed  below. 


(a) 


Mo-wire 


Zr02  (Mgo) 
Solid  electrolyte 

Reference 

electrode 


Auxiliary 

electrode 


(C) 


1 .  Molybdenum  wire 

2.  Alumina  cement 

3.  Alumina  tube 

4.  Alumina  powder 


5.  Reference  powder 

6.  Solid  electrolyte  tube 

7.  Solid  electrolyte  rod 


Fig.  1.  Schematic  diagrams  of  oxygen  concentration  cells, 
(a)  Zr02-tube  type;  (b)  Plug  type;  (c)  Auxiliary  type 


THE  THEORETICAL  FRAMEWORK 


Ideal  electrochemical  cells 

The  oxygen  sensor  which  is  widely  used  in  the  steel  industry  is  essentially  an  electrochemical  oxygen 
concentration  cell,  such  as  shown  in  Figure  la,  and  which  may  be  expressed  as: 


p o2  {ref) |  {solid electrolyte )  |  PQ2{slag )  3. 

Such  a  cell  usually  comprises  a  reference  electrode  of  known  oxygen  potential,  a  solid  electrolyte,  for 
instance  magnesia  stabilised  zirconia,  and  a  liquid  metal  electrode,  the  oxygen  potential  of  which  is  to  be 
determined.  The  output  signal  of  such  an  electrochemical  cell  is  electrical  voltage,  which  is  related  to  the 
oxygen  activity  through  the  Nemst  equation  as  follows: 


78 


E  = 


RT 

nF 


4. 


where  p'02  and  p"02  are  the  oxygen  partial  pressures  at  the  two  electrolytes  respectively,  E  is  the  Open- 
circuit  emf  of  the  galvanic  cell,  R  is  the  Gas  Constant,  T  the  Temperature  (K),  n  the  number  of  electrons 
participating  in  the  exchange  reaction  and  F  is  Faraday's  constant. 

The  partial  pressure  of  the  oxygen  in  the  liquid  metal  is  related  to  the  oxygen  activity  through  the 
thermodynamic  equilibrium  constant  and  hence,  the  oxygen  activity  in  the  liquid  metal  can  be  determined 
directly  from  the  open-circuit  potential  of  the  cell. 

By  using  a  solid  state  zirconia  electrolyte  in  conjunction  with  an  auxiliary  electrode,  it  has  also  been 
possible  to  measure  the  activity  of  alloying  elements  in  liquid  steel.  For  example,  the  activity  of  silicon  in 
the  liquid  hot-metal  product  of  the  blast  furnace  can  be  measured  by  the  addition  of  an  auxiliary  electrode 
firstly  painted  and  then  sintered  onto  a  zirconia  tube  in  the  fashion  shown  in  Figure  lc  [4,7,8], 


The  electrochemical  cell  may  be  expressed  as: 


Mo  I  (Mo+  Moq)  |  Zrq{Mgd)\  (Siq )  JS^Fe  5. 

The  oxygen  potential  at  the  three-phase  boundary  (electrolyte  +  molten  iron  +  auxiliary  electrode  )  is 
established  by  the  equilibrium  reaction 

Zr02  (s)  +  Si(inFe )  +  ()2  (g)  ZrSiOA  (x)  6. 


for  which 


InK  =  -lnhSi  -\npo2  7. 

where  hSi  is  the  silicon  activity  in  molten  iron  referred  to  a  1  mass  %  solution. 

Substitution  of  the  value  of  p02  from  this  equation  into  Equation  4  yields  the  open-circuit  cell  potential  as 


E  = 


—  In  FSl 

4  F  KhSi 


8. 


Hence  the  activity  of  silicon,  and  in  similar  fashion  that  of  other  alloying  elements  in  liquid  metal,  can  be 
determined  directly. 


Practical  electrochemical  cells 

The  Nemst  Equation  above  relates  the  true  emf  of  the  galvanic  cell  to  the  oxygen  concentration  difference 
across  the  solid  electrolyte  only  in  the  ideal  case  when  the  solid  electrolyte  exhibits  pure  ionic  conduction. 
In  practice  zirconia  solid  electrolytes  exhibit  ionic  as  well  as  n-type  electronic  conduction  when  used  at 
high  temperature  and  low  oxygen  potential.  Wagner  [9],  in  appreciation  of  this  difficulty,  developed  a 
relationship  between  the  emf  and  the  oxygen  potential  of  an  electrochemical  cell  containing  a  solid 


79 


electrolyte  which  exhibits  mixed  ionic  and  electronic  conduction.  He  had  shown  that  the  oxygen  potential 
at  the  two  electrodes  p'02  and  |i"02  respectively,  are  related  to  the  emf  as  follows: 


9. 


where  z  is  the  number  of  electrons  required  to  ionise  an  oxygen  molecule  according  to  the  reaction 


02  =  4e~  =  202_ 


10. 


F  is  the  Faraday  Constant  and  t<jon)  is  the  ionic  transport  number. 

Since  p02  can  be  related  to  the  oxygen  partial  pressure,  this  equation  simplifies  to  theNemst  equation  (4) 
when  t(ion)  =  1.  In  other  words,  when  the  solid  electrolyte  exhibits  pure  ionic  conduction.  Since  the  ionic 
conductivity  is  a  function  of  the  prevailing  oxygen  pressure,  it  is  important  to  define  a  parameter  which 
describes  the  electrical  characteristics  of  the  solid  electrolyte  irrespective  of  the  partial  pressure  of  oxygen. 
Schmalzried  [10],  carefully  analysing  Wagner's  model  and  proposed  such  a  parameter  which  he  calledPe 
and  which  characterises  the  electronic  conductivity  of  a  solid  electrolyte.  Pe  is  defined  as  the  partial 
pressure  of  oxygen  at  which  t(jon) =  0.5.  In  mathematical  terms  this  statement  may  be  expressed  as: 


Substitution  of  this  criterion  into  Wagner's  equation  leads  to  the  following  relationship: 

E  =  £rJ(£v)j±(r£)il 

(p'o.y  +(Pe)*  . 

Thus,  if  the  extent  of  electronic  conduction  of  a  solid  electrolyte,  characterised  by  the  Pc  value  is  known, 
and  accurate  measure  of  the  prevailing  oxygen  potential  in  liquid  metals  or  slags  can  be  obtained  from  a 
measurement  of  the  emf  of  the  electrochemical  cell.  It  is  consequently  of  great  importance  to  assess  to  what 
extent  accurate  values  of  Pe  can  be  obtained  experimentally  and  also  to  determine  what  material  properties 
and  processing  parameters  have  an  influence  of  the  electronic  conductivity.  Hence,  the  Pe-value  of  a 
particular  solid  electrolyte  ought  to  be  determined  before  its  use.  It  is  also  important  to  be  aware  that  an 
erroneous  emf  may  be  measured  if  electronic  conduction  in  the  solid  electrolyte  causes  transportation  of 
ions  which,  in  turn,  will  cause  polarisation.  Also,  if  chemical  interaction  between  the  melt  and  electrolyte 
influences  the  electrical  characteristics  thereof.  In  addition,  mass  transport  of  oxygen  in  the  melt  may 
influence  the  absolute  value  of  the  emf  measured.  Mass  transport  becomes  important,  especially  in  those 
cases  where  multi-phase  equilibria  need  to  be  established. 

A  single  example  will  illustrate  the  importance  of  knowing  precisely  the  Pe-value  of  the  solid  electrolyte 
used  in  an  electrochemical  cell  which  is  used  to  determine  the  oxygen  potential  of  a  liquid  metal.  As 
background  it  is  of  some  significance  to  refer  to  requirements  in  the  stainless  steel  industry.  The  amount  of 
alloying  additions,  the  optimum  slag  composition  required  to  ensure  efficient  refining  and  the  details  of  the 
process  route  to  be  followed  are  usually  determined  by  the  use  of  mathematical  models  and  artificial 
intelligence  systems.  Of  critical  importance  to  calculate  these  requirements,  is  a  knowledge  of  the  activity 
of  chromium  in  the  melt,  simply  because  it  is  the  prevailing  activity  of  chromium  which  dominates  the 


80 


slag/metal  equilibria  of  importance.  Hence,  the  intelligent  processing  aids  employed  in  process  control  are 
of  little  value  if  the  activity  of  chromium  as  a  function  of  alloy  content  of  the  molten  bath  is  not  known  to  a 
high  degree  of  accuracy.  Experimentally,  the  chromium  activity  can  be  calculated  from  a  knowledge  of  the 
oxygen  potential  in  a  liquid  iron-chromium  alloy  which  is  in  equilibrium  with  a  Cr203-saturated  slag.  This 
value  in  turn,  is  determined  by  an  electrochemical  sensing  techniques  as  described  above,  employing  a 
zirconia  solid  electrolyte  as  the  core  component. 

About  thirty  years  ago  Fruehan,  in  pursuit  of  optimising  process  control,  measured  the  chromium  activity  in 
iron  using  such  an  oxygen  probe  [11],  Many  years  later,  Geldenhuis  [12,13]  repeated  these  same 
measurements  for  different  reasons  but,  interestingly  enough,  found  a  discrepancy  of  some  30%  in  the 
chromium  activities  at  chromium  compositions  of  commercial  interest.  It  was  possible  to  resolve  this 
discrepancy  by  consideration  of  the  properties  of  the  solid  electrolytes  used  in  the  different  experiments.  At 
the  time  Fruehan  did  his  experiments,  the  electric  characteristics  of  zirconia  electrolytes  were  not  known 
and  he  was  forced  to  assume  that  the  solid  electrolyte  is  an  ideal  ionic  conductor.  Geldenhuis  on  the  other 
hand,  measured  the  Pe-value  of  the  solid  electrolyte  he  was  using.  Fortunately,  Fisher  andPieper  [14]  had 
by  then  measured  the  Pe-value  of  an  electrolyte  similar  to  that  used  by  Fruehan  and  when  the  activities 
Fruehan  measured  were  recalculated  using  this  Pe-value,  the  recalculated  activities  agreed  with  those  of 
Geldenhuis.  This  example  is  important  because  it  shows  that  a  30%  error  in  the  experimentally  determined 
values  of  the  chromium  activity  in  liquid  iron  can  result  from  a  lack  of  knowledge  of  the  Pe-value  of  the 
solid  electrolyte  used  in  the  sensor,  emphasising  the  need  to  know  the  Pe-value  exactly. 

Experimental  measurement  of  the  Pe  value 

The  Pe-value  of  a  solid  electrolyte  can  be  determined  most  effectively  by  measuring  the  cellemf  of  type: 

p'  oi  |  {solid  electrolyte)  |  p"  o, 

where 

p'02«Ve«p"O2 


By  imposing  this  condition,  equation  12  becomes: 


E  = 


P  Oi 
Pe 


The  value  of  Pe  can  therefore  be  determined  directly  if  the  value  of  p"02  is  known. 


13. 


The  so-called  coulometric  titration  method  [15]  embodies  these  principles  and  is  usually  used  to  measure 
the  Pe-value  of  solid  electrolytes.  A  typical  cell  arrangement  for  such  a  measurement  is  shown  in  Figure  2. 


Fig.  2.  Cell  arrangement  for  Coulometric  Titration 


81 


Experimentally,  the  condition  p'02  «  Pe  is  imposed  by  passing  a  direct  electrical  current  under  an  applied 
potential  of  approximately  5V  through  the  cell  so  as  to  remove  oxygen  electrolytically  from  the  silver, 
transporting  it  to  the  Pt/Ag  electrode  and  thereby  polarising  the  silver  interface  of  the  electrolyte.  Since  the 
Pt/Cb-electrode  is  essentially  non-polarisable,  the  partial  pressure  at  this  interface  of  the  electrolyte  remains 
atmospheric,  hence  maintaining  the  condition  Pe «  p"02-  As  soon  as  the  polarising  current  is  interrupted,  a 
stable  open-circuit  emf  plateau  is  obtained  for  about  4  seconds  after  which  the  potential  decays  slowly. 
However,  it  is  only  possible  to  obtain  a  stable  plateau  if  there  is  no  leakage  of  Ch  back  into  the  molten  Ag 
and  so,  a  gas-tight  experimental  assembly  is  a  prerequisite  if  reliable  Pe  measurements  are  to  be  made.  An 
example  of  the  Pe-values  of  a  long  MgO-stabilised  zirconia  tube,  as  a  function  of  temperature,  is  shown  in 
Figure  3. 


t/T  x  104(K_1) 


Fig.  3.  Pe-values  of  a  long  magnesia  stabilised  zirconia  solid  electrolyte  tube  as  a  function  of  temperature 

It  is  interesting  to  note  that  the  solid  electrolyte  displays  different  Pe-values  on  heating  and  cooling  below  a 
temperature  of  1400  °C  and  furthermore,  that  this  behaviour  is  irreversible.  However,  the  Pe-value 
measured  above  1400  °C  is  the  same  whether  a  heating  or  cooling  cycle  is  used.  This  behaviour  is  related  to 
the  phase  composition  in  the  Zr02-MgO-system  and  it  does  indicate  that  phase  equilibrium  in  the  solid 
electrolyte  is  to  be  ensured  before  reliable  measurements  can  be  made. 

Because  the  electrical  characteristics  of  a  solid  oxygen  electrolyte  are  dependent  not  only  on  the  bulk 
chemical  composition  but  also  on  the  manufacturing  technique,  heat  treatment  and  phase  composition,  the 
Pe-values  of  the  small  solid  electrolyte  tubes  which  are  actually  being  used  in  the  commercial  oxygen 
sensors  need  to  be  determined.  In  the  past  it  was  not  possible  to  use  the  coulometric  titration  technique  for 
this  purpose  because  of  the  difficulties  encountered  in  constructing  a  gas  tight  seal  between  the  small 
electrolyte  tubes  and  the  extension  tube  which  is  required  to  position  the  small  tube  in  the  hot-zone  of  the 
furnace.  Recourse  had  to  be  had  to  measurements  on  long  tubes  which  were  manufactured  to  simulate  the 
properties  of  the  tubes  actually  used  in  the  commercial  sensors.  However,  a  couple  of  years  ago, 
researchers  in  the  University  of  Pretoria  overcame  this  problem  by  the  construction  of  a  gas-tight  seal 
which  enabled  the  positioning  of  the  small  tubes  in  the  hot-zone  of  the  furnace  [16,17],  It  is  the 
construction  and  use  of  this  seal  which  has  led  to  the  accurate  measurement  of  the  Pe-values  of  the  solid 
electrolytes  which  are  used  in  commercially  available  oxygen  probes. 

The  construction  of  such  a  gas-tight  seal  is  schematically  shown  in  Figure  4. 


82 


Before  heat  After  heat  treatment 

treatment 


Alumina 

extension 

tube 


Short  electrolyte 
tube 


Short 

electrolyte 

tube 


Slip-cast 
alumina 
coupling  (fired) 


oiip-udM  cuuiinim 

coupling  (unfired) 


Fig.  4.  Schematic  illustration  of  the  construction  of  a  gas-tight  ceramic  seal 

In  principle,  a  short  electrolyte  tube  is  coupled  onto  an  alumina  extension  tube  by  means  of  a  slip-cast 
alumina  coupling.  Experimentally,  a  slip-cast  alumina  disc  was  presintered  at  900  °C  to  strengthen  it, 
followed  by  machining  of  the  appropriate  holes.  The  small  electrolyte  tube,  alumina  extension  tube  and  the 
alumina  coupling  were  assembled  and  slowly  heated  to  1500  °C,  held  for  2  h  and  slowly  cooled  to  room 
temperature.  During  the  heat  treatment  the  slip-cast  alumina  disc  shrinks  and  forms  a  strong  gas-tight  seal 
round  the  surfaces  of  the  electrolyte  and  the  alumina  tube.  Using  such  seals,  the  Pe-values  of  short  solid 
zirconia  electrolytes  could  be  measured  and  the  results  of  such  experiments  are  shown  in  Figure  5. 


5.3  5.4  5.5  5.6  5.7  5.8  5.9  6.0  6.1 


1/T  x  104(K_1) 

Fig.  5.  P  e-values  of  short  magnesia  stabilised  zirconia  solid  electrolyte  tubes, 
of  the  type  used  in  industrial  practice,  as  a  function  of  temperature 

DISCUSSION 

The  late  Keith  Brimacombe  had  a  passion  to  quantify  the  linkage  between  process  and  product.  He 
developed  as  well  as  implemented  mathematical  models  and  artificial  intelligence  at  a  time  when  other 


83 


process  metallurgist  still  feared  to  tread.  Another  passion  was  to  validate  such  models  and  many  examples 
of  careful  experimental  measurements  in  the  laboratory  and  pioneering  measurements  on  the  plant  are  to  be 
found  in  his  work.  However,  he  not  only  emphasised  the  validation  of  models  but  also  the  validation  of  the 
data  which  are  used  as  input  to  these  models.  1  have  attempted  to  illustrate  the  great  importance  of  the 
validation  of  such  data,  in  this  instance  the  electronic  properties  of  solid  electrolytes  which  are  used  daily  in 
most  steel  plants.  That  brings  me  to  the  third  of  Keith's  many  passions:  the  interaction  between  university 
and  industry.  He  impressed  industry  and  the  scientific  community  alike  with  his  remarkable  talent  to 
nurture  collaboration  and  has  as  a  consequence  received  wide  recognition  for  these  efforts.  I  am  glad  to  say 
that  the  work  on  the  experimental  measurement  of  the  oxygen  potential,  conducted  in  the  University  of 
Pretoria,  was  done  in  close  collaboration  with  the  steel  industry  as  well  as  the  ceramics  industry  and 

Thus  the  engine  of  knowledge  generation 
(was)  coupled  with  the  engine  of  wealth  creation 

J  Keith  Brimacombe  (1993)  [18] 

The  development  of  the  gas-tight  ceramic  seal  has  made  it  possible  to  measure  the  Pe-value  of  the  small 
electrolytes  used  in  commercially  available  oxygen  sensors.  This  means  that  the  prevailing  oxygen 
potential  can  be  determined  to  a  high  degree  of  accuracy  and  consequently,  the  equilibrium  distribution  of 
oxidisable  elements  between  the  liquid  metal  and  the  slag  can  be  calculated  much  more  reliably.  Because 
the  prevailing  oxygen  potential  in  the  liquid  metal/liquid  slag  system  plays  such  a  pivotal  role  in  the 
mathematical  models  which  in  turn,  form  an  integral  part  of  the  intelligent  process  control  systems  used  in 
steel  processing,  a  proper  knowledge  of  the  oxygen  potential  should  enhance  effective  process  control. 


I  should  like  to  conclude  with  reference  to  the  well-known  words  of  Henry  Longfellow  which  I  thought 
apply  to  our  gathering  here  today: 

Lives  of  great  men  all  remind  us 
We  can  make  our  lives  sublime, 

And,  departing,  leave  behind  us 
Footprints  on  the  sands  of  time. 

Let  us  pay  tribute  to  our  dear  departed  friend  by  striving  to  achieve  those  ideals  he  so  passionately  fought 
for.  Let  us,  like  him,  relentlessly  labour  in  pursuit  of  the  truth  and 

Let  us,  then,  up  and  doing, 

With  a  heart  for  any  fate; 

Still  achieving,  still  pursuing, 

Learn  to  labour  and  to  wait. 


A  Psalm  of  Life  (1838) 

Henry  Wadsworth  Longfellow 


REFERENCES 

1  J.K.  Brimacombe,  1989.  The  extractive  metallurgist  in  an  emerging  world  of  materials.  Metall.  Trans.  B., 
20B,  June,  291-313. 

2.  F.  Oeters  and  H.  Xie,  1995.  A  contribution  to  the  tehoretical  description  of  metal-slag  reaction  kinetics. 
Steel  Research,  66(10),  409-415. 

3.  A.  McLean,  1990.  Sensor  aided  process  control  in  iron  and  steelmaking.  Solid  State  Ionics  40/41,  737-742. 

4.  M.  Iwase,  1992.  Developments  in  Zirconia  Sensors  during  the  1980k  -  Laboratory  and  in-plant 
applications  in  iron  and  steelmaking  INFACON  6,  Proc.  of  the  Is'  Chromium  Steel  and  Alloys 
Congress,  Cape  Town,  V.2,  Johannesburg,  SAIMM,  49-61. 

5.  D.  Janke,  1990.  Recent  developments  of  solid  ionic  sensors  to  control  iron  and  steel  bath  composition 
Solid  State  Ionics  40/41,  764-769. 


84 


6.  S.  Tu  and  D.  Janke,  1995.  EMF  Sensor  Control  in  vacuum  decarburization  and  deoxidation  of  steel 
melts.  ISIJInt.  35(11),  1362-1367. 

7.  S.  Tu,  V.  Burzev  and  D.  Janke,  1995.  EMF  Sensing  of  Al,  Ti  and  Cr  dissolved  in  pure  iron  melts. 

Trans.  ISS,  Nov,  61-68. 

8.  M.  Iwase  and  A.  Mclean,  1990.  Sensors  for  iron  and  steelmaking.  Proc.  of  the  6th  International  Iron  and 
Steel  Congress,  Nagoya,  Japan,  ISIJ,  521-528. 

9.  C.  Wagner,  1993,  Theory  of  Tarnishing  Process,  Z.  Physic.  Chem,  Abt.  B,  21, 25-41. 

10.  H.  Schmalzried,  1962.  Uber  Zircon  dioxid  als  Electrolyte  fur  electrochemische  Untersuchungen  bei 
hoheren  Temperaturen.  Electro chemie,  66,  572-576. 

11.  R.  J.  Fruehan,  1969.  Activities  in  the  liquid  Fe-Cr-O  system.  Trans.  Met.  Soc.  AIME,  245,  1215-1218. 

12.  J.M.A.  Geldenhuis,  1991.  Development  of  electrochemical  sensing  techniques  for  the  determination  of 
activity-composition  relations  in  liquid  alloys  and  slags  at  1 873K.  PhD  thesis,  University  of  Pretoria. 

13.  J.M.A.  Geldenhuis  and  R.J.  Dippenaar,  1991.  A  reassessment  of  the  activity  of  chromium  in  the  Fe-Cr- 
O  system  at  1873K.  Metall.  Trans.  B.,  Vol  22B,  Dec,  915-917. 

14.  W.A  Fisher  and  C.  Pieper,  1973,  Die  elektrische  Leitfahigkeit  und  Thermokraft  von  reinem  und  mit 
Calciumoxid  stabilisation  Zirkonoxid  bel  Termperaturen  zwischer  1000  und  1700°C  und 
Sauerstoffpartiel  drucken  zwischen  1  und  10‘ 16  atm  Arch.  Eisenhiittenwes,  V.44,  251-259. 

15.  D.A.J.  Swinkels,  1970.  Rapid  determination  of  electronic  conductivity  limits  of  solid  electrolyte.  J. 
Applied  Electrochemistry,  1297-1298. 

16.  J.M.A.  Geldenhuis,  1988.  The  quantitative  determination  of  certain  electrical  characteristics  of 
commercial  zirconia  solid  electrolytes  (in  Afrikaans),  University  of  Pretoria. 

17.  M.J.U.T.  van  Wijngaarden,  J.M.A.  Geldenhuis  and  R.J.  Dippenaar,  1988.  An  experimental  technique 
employing  a  high-temperature  gas-tight  alumina  seal  for  the  assessment  of  the  electrical  properties  of 
solid  electrolytes.  J.  Applied  Electrochemistry,  1998,  18,  724-730. 

18.  J.K.  Brimacombe,  1993.  Prosperity  into  the  next  millenium  built  on  the  process  engineering  of 
materials.  Is'  Int.  Conf.  On  Processing  Materials  for  Properties,  Eds.  Henein  and  Oki,  The  Minerals, 
Metals  and  Materials  Soc.,  7-15. 


85 


Computer  Simulation  and  Information  Management  Systems 

for  Material  Processing 

Yoshiyuki  Nagasaka 

Department  of  Distribution  Science,  Osaka  Sangyo  University 
3-1-1  Nakagaito,  Daito,  Osaka,  574-8530,  Japan 

ABSTRACT 

Several  computer  simulation  and  information  systems  for  machinery  industries  have  been  studied.  The 
objective  is  to  construct  a  virtual  manufacturing  environment  for  concurrent  engineering.  For  the 
environment,  three  types  of  models,  which  are  geometry  model  for  a  product,  mathematical  model  for 
physical  phenomena  and  activity  model  for  human  operations,  should  be  considered.  In  this  study, 
computer  simulation  systems  for  casting  and  heat  treatment  processes  as  tools  of  a  virtual  manufacturing 
environment  have  been  developed  considering  the  interface  of  geometry  model  from  a  three-dimensional 
CAD  system.  These  computer  simulations  can  predict  the  quality  and  productivity  quantitatively.  As  well, 
an  information  management  system  for  pre-production  material  processes  based  on  activity  models  has 
been  developed.  Background  data  for  decision-making  is  stored  and  related  to  foreground  information  such 
as  multimedia  data.  A  summary  of  the  system  is  shown  and  several  examples  are  presented. 


INTRODUCTION 

If  the  manufacturing  conditions  can  be  optimized  in  the  design  stage  before  actual  trials,  the  total  cost  and 
lead-time  can  be  reduced  drastically.  Generally  speaking  mechanical  designers  should  consider  functions 
and  productivity  simultaneously,  but  it  is  not  easy.  However  they  must  indicate  which  processing  is  good 
for  each  part  in  a  drawing  definitely.  For  example,  it  is  very  important  to  select  proper  material  and  heat 
treatment  process  for  gear  production.  The  problem  is  that  manufacturing  engineers  have  no  generic 
method  to  show  quantitative  data  to  the  designers  before  actual  trials  so  far.  It  is  thought  that  VME 
(Virtual  Manufacturing  Environment)  is  one  of  the  most  effective  technologies  to  solve  this  problem.  In 
this  study,  several  computer  simulations  and  information  systems  have  been  developed  to  construct  a 
virtual  manufacturing  environment  of  materials  processing. 

For  a  virtual  manufacturing  environment,  geometry  model  and  mathematical  models  for  physical 
phenomena  should  be  considered  as  well  as  activity  models  for  operations.  Recently  some  three- 
dimensional  CAD  systems  have  come  into  wide  use  to  handle  geometry  models  easily.  In  practice  many 
pre-production  processes  are  performed  considering  the  geometry.  Activity  models  for  pre-production 
processes  are  useful  to  inherit  and  improve  the  skillful  operations.  Computer  simulation  based  on 
mathematical  models  is  also  useful  to  predict  productivity  and  quality  for  the  materials  processing.  These 
tools  are  valid  by  harmonizing  with  practical  works. 

Although  several  commercial  software  packages  for  material  processing  are  available,  we  sometimes  want 
to  develop  an  original  computer  simulation  system  because  these  packages  are  not  satisfied  very  much  for 
the  practical  operation.  However  it  is  not  easy  to  develop  such  a  sophisticated  system  by  only  one 
company.  For  this  problem  it  may  be  a  good  way  to  construct  a  consortium.  The  consortium  can  consist  of 
some  companies,  universities  as  well  as  organizations  of  the  government  for  specific  objective.  A 
computer  simulation  system  for  casting  processes  has  been  developed  by  a  Japanese  consortium  and  named 
"JS-CAST".  Mold  filling  and  solidification  phenomena  can  be  simulated  to  predict  shrinkage  cavities  and 
optimize  mold  design.  The  architecture  of  “JS-CAST”  is  open  to  users.  Namely,  it  is  possible  to  develop 
other  unique  solvers  and  combine  them  with  “JS-CAST”.  This  function  is  helpful  to  make  the  CAE  system 
grow  up.  In  addition,  another  computer  simulation  system  for  heat  treatment  named  "GRANTAS"  has 
been  developed.  It  is  based  on  a  mathematical  model  of  phase  transformations  and  elasto-plastic  stresses. 
Hardness  and  residual  stress  distribution  can  be  predicted. 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


86 


Furthermore,  an  information  management  system  for  material  pre-production  processes  named  "JCAP"  has 
been  developed  based  on  activity  model  by  considering  the  requirements  of  practical  engineers.  The 
objectives  of  the  system  are  to  achieve  more  efficient  and  reasonable  pre-production  design  as  well  as  to 
inherit  traditional  technologies.  A  methodology  has  been  developed  to  implement  the  activity  model  in  a 
computer.  The  activity  model  includes  task  models  which  are  structured  as  a  tree  and  related  with  a  data 
model  including  background  information.  Documents,  drawings  and  computer  files  are  created  through 
several  tasks.  We  can  know  the  foreground  information  by  looking  at  the  results.  But,  it  is  usually  very 
difficult  to  know  the  reason  why  the  value  was  fixed.  Namely,  the  background  information  is  important  to 
know  how  the  engineer  made  a  decision.  For  example,  a  drawing  is  not  enough  to  know  why  and  how  the 
dimension  is  determined.  The  design  background  information  is  stored  and  linked  to  each  object  in  the 
drawing  in  this  system.  It  helps  us  to  verify  design  conditions. 


VIRTUAL  MANUFACTURING  ENVIRONMENT 
FOR  MATERIAL  PROCESSING 

Most  companies  continue  to  make  effort  to  utilize  information  technologies  to  improve  quality  and 
productivity.  Actually  each  computer  system  has  been  developed  and  used  to  optimize  each  process.  CAD 
is  powerful  to  achieve  efficient  drawing  and  CAM  is  popular  to  produce  metal  molds  with  very  accurate 
dimensions.  PDM  is  useful  to  store  and  control  lots  of  data  related  to  a  product.  Recently  integration  of 
each  system  is  progressed  to  optimize  enterprise  resources  totally.  ERP  is  considered  as  a  representative 
system  to  control  stock  and  to  manage  human  operation  and  manufacturing  procedures  depending  on 
demand.  Moreover,  the  concept  of  virtual  manufacturing  environment  is  accepted  to  put  into  practical  use. 
Digital  manufacturing  or  digital  engineering  has  similar  meaning.  The  relation  of  these  computer  systems 
and  business  processes  is  shown  schematically  in  Figure  1 . 


Fte.l.  The  relation  of  information  svstems  and  business  processes. 


Concurrent  engineering  approach  can  be  performed  efficiently  and  powerfully  in  a  virtual  manufacturing 
environment.  Namely,  a  virtual  prototype  is  generated  and  examined  by  common  consent  in  network 
computers.  A  designer  generates  a  geometry  model.  The  model  is  the  origin  of  a  virtual  prototype.  The 
prototype  is  transferred  to  manufacturing  designers,  cost  estimation  engineers,  supply  division  workers  and 
so  on.  A  lot  of  examinations  can  be  performed  simultaneously.  At  the  same  time,  lots  of  explicit  data  such 
as  weight,  cost,  supplier's  name  and  delivery  date  is  attached  to  the  prototype.  During  these  processes, 
requirements  to  modify  the  design  must  occur  many  times  because  productivity  and  reliability  are  checked 
from  several  different  views.  For  example,  a  virtual  environment  for  assembly  is  very  useful  to  check 
interference  problems  and  to  estimate  assembly  time.  If  some  problems  are  found  oul,  the  original 
geometry  may  be  changed.  Off  line  teaching  system  for  industrial  robots  is  also  very  powerful  to  set  up 
NC  data  in  a  short  time  without  inconsistency.  Each  CAE  system  for  each  process  has  been  developed  for 
specific  purpose  as  shown  in  Table.  1. 


87 


Recently,  these  CAE  systems  are  used  in  an  integrated  virtual  manufacturing  environment.  The  common 
geometry  model,  which  can  be  applicable  to  all  tools  in  the  environment,  should  be  prepared.  In  addition, 
not  only  explicit  simulation  such  as  geometry  check  but  also  mathematical  models  to  predict  quality  of  a 
product  such  as  fatigue  strength  and  residual  stresses  are  very  important.  For  instance,  scattering  range  of 
mechanical  properties  such  as  hardness  distribution  usually  depends  on  the  character  of  material  processing 
and  the  original  quality  of  material.  The  designer  should  consider  the  substantial  strength  essentially.  For 
these  problems,  several  computer  simulations  of  material  processing  are  useful.  Although  several 
commercial  software  packages  are  available,  they  cannot  sometimes  satisfy  us  very  much  for  practical 
operations.  However  it  is  difficult  to  develop  and  improve  such  a  sophisticated  system  by  only  one 
company.  For  this  problem  it  may  be  a  good  way  to  construct  a  consortium  which  consists  of  some 
companies,  universities  and  organizations  of  the  government.  A  consortium  in  Japan  has  developed  a 
simulation  system  for  casting  processes  as  shown  later.  Furthermore,  computer  simulations  for  heat 
treatment  developed  by  us  is  ready  to  be  grown  up  by  another  consortium.  Not  only  computer  simulations 
but  also  information  management  systems  are  necessary  to  construct  a  virtual  manufacturing  environment. 
An  information  management  system  for  material  pre-production  processes  has  been  developed  based  on 
activity  model  by  a  consortium  as  shown  later.  It  is  important  that  we  should  have  tools  which  can  show 
clear  quantitative  data  and  rationale  of  material  processing  to  mechanical  designers  and  others  who  don't 
know  the  processing  very  much.  Those  tools  are  important  parts  of  a  virtual  manufacturing  environment. 


Table  1:  Processes  for  machinery  production  and  applicable  CAE  for  VME 


Process 

Objective 

CAE 

Design 

Planning 

Design 

Design  CAD 

Layout,  Vision 

Geometric  Simulation 

Parts 

Elastic  stress 

stress  analysis 

Fatigue  life 

Fatigue  analysis 

Contact 

Contact  stress  analysis 

Vibration 

Vibration  analysis 

Behavior 

Collision 

Collision  analysis 

Noise 

Noise  analysis 

Comfortableness 

Vibration  analysis 

Engine 

Power 

Combustion  analysis 

Efficiency 

Thermal  fluid  flow  analysis 

Piping,  Harness 

Layout 

Geometric  Simulation 

Assembly  efficiency 

Geometric  Simulation 

Material 

Casting 

Defect,  Microstructure,  Strength 

Mold  filing,  solidification  analysis 

processing 

Forging 

Defect,  Productivity 

Plastic  deformation  analysis 

Powder  metallurgy 

Deformation,  Productivity 

Distinct  Element  analysis 

Plastic  forming 

Productivity,  Quality 

Plastic  deformation  analysis 

Polymer  processing 

Injection  condition.  Quality 

Mold  flow  analysis 

Electric  device  process  in; 

Productivity,  Quality 

Molecular  dynamics 

Manufacturinj 

Welding 

Nesting,  Deformation 

Thermal  stress  analysis 

processing 

Robot  teaching 

Off-line  teaching 

Robotics  simulation 

Machining 

Process  planning 

Machining  simulation 

Heat  treatment 

Condition,  Microstructure, 

Phase  transformation.  Thermal 

Strength 

stress  analysis 

Surface  treatment 

Condition,  Strength 

Chemical  phenomena  simulation 

Component 

Assembly  efficiency 

Geometric  Simulation 

Sub-assembly 

Assembly  efficiency 

Geometric  Simulation 

Total  assembly 

Assembly  efficiency 

Geometric  Simulation 

Scheduling 

Optimum  scheduling 

Operation  research 

Supply  management 

Optimum  supplement 

Operation  research 

88 


DEVELOPED  COMPUTER  SIMULATION  SYSTEMS  FOR  MATERIAL 
PROCESSING 

Mold  Filling  and  Solidification  Simulation  for  Casting  Processes 

Research  Committee  of  Casting  CAE  (Chairman:  l.Ohnaka)  was  established  at  the  Materials  Process 
Technology  Center  in  Japan  to  propose  a  suitable  basic  design  of  a  new  CAE  system  for  casting  processes 
in  1992.  The  proposal  was  adopted  as  a  development  project  of  1993  by  Japan  Small  Business  Corporation, 
which  is  one  of  the  government  organizations.  Then,  a  CAE  system  of  casting  processes  named  “JS-CAST” 
has  been  completed  as  a  result  of  many  peopled  effort  for  three  years.  At  first,  the  know-how  and  basic 
technology  of  "SOLDIA"  [1]  developed  by  Komatsu  Ltd.,  which  is  one  of  the  most  popular  packages  for 
foundries  in  Japan,  has  been  succeeded  to  by  "JS-CAST"[2].  Then,  the  results  of  research  performed  in 
Osaka  University  were  added  to  the  CAE  system.  This  approach  could  reduce  the  development  time  and 
cost.  For  development,  the  necessary  conditions  mentioned  below  were  fully  considered. 

1 .  Clear  objectives  of  CAE  System  and  low  computation  cost 

2.  Accurate  predictions  and  user  friendly  operation 

3.  Open  architecture  and  enough  support  service 

The  CAE  system  consists  of  pre-,  main-,  post-processor,  CAD  data  interfaces  and  a  weight  calculation 
module  [2].  Mold  filling  and  solidification  phenomena  can  be  simulated  with  rectangular  elements  for 
DFDM  (Direct  Finite  Difference  Method).  The  CAE  system  is  available  on  UNIX  or  Windows.  Project 
Manager  has  been  developed  to  supply  more  user-friendly  operation.  Once  ‘JS-CAST”  is  executed, 
Project  Manager  is  displayed  and  small  windows  are  prepared  to  show  three-dimensional  shapes  of  several 
projects  indicating  each  progress.  If  a  project's  window  is  chosen,  the  menus  that  should  be  selected  next 
are  highlighted  for  easy  operation. 

For  three  dimensional  geometry  construction,  digitizing  of  a  drawing  and  utilizing  a  two-dimensional  CAD 
data  of  IGES  as  well  as  direct  input  of  coordinates  are  possible.  Primitive  shapes  such  as  polyhedron, 
cylinders  and  solids  of  revolutions  are  created  and  combined  by  rotating,  copying’ and  moving  STL  data 
can  be  utilized  as  direct  input  data,  too.  A  rectangular  mesh  in  xyz  is  generated  automatically  keeping  in 
mind  the  maximum  and  the  minimum  dimensions  of  casting  along  the  x,  y  and  z  directions.  Furthermore, 
the  operator  can  edit  the  generated  mesh  easily.  After  these  operations,  the  system  automatically  creates  a 
solid  model  of  the  casting  including  its  mold.  The  mold-filling  solver  calculates  transient  fluid  flow  with 
free  surfaces  and  heat  transfer  phenomena.  The  temperature  distribution  after  pouring  is  used  as  initial 
conditions  of  the  solidification  solver.  Solidification  time  and  parameters  to  characterize  shrinkage  defects 
are  stored  as  calculated  results.  Post-processing  by  using  multi-windows  is  very  useful  to  compare  several 
results.  Any  sections  of  the  casting  and  mold  can  be  scanned  simply  with  a  mouse  operation.  An  operator 
can  check  filling  time,  velocity,  temperature,  pressure,  solidification  time  and  temperature  gradient  of  any 
desired  section  from  any  three-dimensional  view.  An  example  is  shown  in  Figure  2. 


(a)  A  Solid  model  for  DFDM  (b)  Velocity  distribution  (c)  Temperature  distribution 
Fig.  2.  A  mold  filling  calculation  of  a  casting 


89 


Phase  Transformations  and  Elasto-Plastic  Stress  Simulation  for  Heat  Treatment  Processes 

Mathematical  models  for  thermo-elasto-plastic  behaviors  during  material  processing  have  been  studied  by 
J.K.Brimacombe 's  group  vigorously  [3].  then,  a  practical  computer  simulation  system  using  FEM  (finite- 
element  technique),  named  "GRANTAS",  for  heat  treatment  processes  was  bom  based  on  the  research[4]. 
The  objective  is  to  predict  microstructure  and  hardness  distribution,  as  well  as  distortion  and  residual 
stresses  quantitatively  in  practical  use.  The  linkages  among  temperature,  stress,  carbon  distribution  and 
microstructure,  which  a  model  of  carburizing  and  quenching  must  be  able  to  predict,  are  shown  in  Figure  3. 
Thus,  the  four  sub-models  which  comprise  the  overall  model  should  be  coupled,  because  the  microstructure 
is  determined  by  the  thermal  history  while  the  heat  generated  by  the  phase  transformation  influences  the 
temperature  distribution;  the  volume  changes  produced  by  phase  transformations  and  variations  in 
temperature  result  in  thermal  stresses.  An  elasto-plastic  model  is  needed  to  compute  stresses,  because 
normally,  the  yield  stress  is  exceeded  during  quenching  to  cause  residual  stresses  and  distortion  of  some 
magnitude. 


Elasto-Plastic  MbdeP 


Hardness  distribution  is  one  of  the  most  important  properties  that  mechanical  designers  want  to  know.  In 
order  to  estimate  the  hardness,  the  following  procedure  can  be  adopted.  First,  an  equivalent  Jominy 

distance  is  obtained  from  the  calculated  . 

cooling  rate  at  each  point  by  applying  iHeat  Transfer  Model ; 
the  experimental  relations;  then,  the  j 

hardness  is  calculated  based  on  - 

experimentally  obtained  Jominy  curves. 

If  the  experimental  data  can  be  obtained 
easily  and  generally,  a  mathematical 
model  for  practical  purpose  should  use 
them  positively. 


"GRANTAS"  has  pre-,  main-,  post¬ 
processor  and  properties  database. 
Meshing  for  FEM  is  semi-automatically 
generated  based  on  CAD  data.  After 
meshing,  the  operator  must  select 
material  and  heat  treatment  conditions. 
Finally,  calculated  results  are  obtained 
as  shown  in  Figure  4  and  5. 


Temperature 
dependent  phase 
transformation 


Transformation 
stress  & 
plasticity 


Metallurgical  structure 


■  Phase  Transformation  Kinetics 


Fig.3.  The  linkages  among  temperature,  stress,  carbon 
distribution  and  microstructure. 


0  1  2'  3  4 

Distance  from  surface  (mm) 


(a)  FEM  mesh  (b)  Temperature  (c)Hardness  Fig.5.  Comparison  of  predicted  and  measured 
Fig.4.  A  simulation  of  quenching  (1/2  tooth  of  a  gear)  hardness  distribution  in  a  gear 


90 


An  Information  Management  System  for  Material  Pre-Production  Processes  based  on  Activity  Model 

In  general,  pre-production  work  is  a  non-routine  work  and  much  depends  upon  inherent  technologies  of  the 
person  in  charge.  It  is  necessary  to  share  and  inherit  important  information  in  natural  form  [5].  Here, 
methodology  has  been  reviewed  to  register  background  information  for  decision  making  in  addition  to 
formal  information  as  a  database  and  relate  it  to  activity  model  and  CAD  system.  Requirements  for  pre- 
production  supporting  information  system  are  summarized  as  follows.  (1)  Complete  data  management  with 
which  data  in  various  forms  like  figures,  pictures  or  CAD  drawings  can  be  easily  accumulated  and 
retrieved.  (2)  Clear  modeling  of  pre-production  work  and  link  to  database  of  background  information 
relating  to  each  process.  (3)  A  system  to  accumulate  data  with  less  load  operation.  (4)  A  system  that  can 
manage  information  that  should  be  unitarily  and  centrally  controlled  for  the  company  and  other  types  of 
information  that  should  be  accumulated  and  distributed  for  individual  person  in  charge  or  small  groups. 

Activity  model  is  a  model  of  entire  production  activity  which  include  various  elements  of  working 
procedure  or  person  in  charge,  negotiation,  environment,  background  of  decision  making  and  so  forth. 
Here,  activity  model  is  considered  to  implement  for  practical  use  and  an  infrastructure-like  software  named 
"J-CAP"  has  been  developed.  The  activity  model  can  be  loaded  in  the  combination  of  three  elements  such 
as  hierarchical  task  model,  design  items  to  be  determined  in  each  work  level  (like  shape,  working  condition 
or  specification)  and  background  information  related  to  each  design  item.  These  three  elements  are 
arranged  in  the  manner  where  they  can  be  freely  registered  or  changed  and  their  mutual  relationship  can  be 
modified  or  dynamically  constructed  as  shown  in  Figure  6.  Thus,  an  integrated  system  was  developed  in 
which  this  activity  model  and  multi-media  database  and 
further,  CAD  are  combined.  All  contents  of  operation  are 
accumulated  as  log  information  and  determined  values  are 
accumulated  as  a  part  of  product  information.  Through  the 
above  arrangement,  it  became  possible  to  select  effective 
background  information  like  frequently  used  information  or 
the  basis  to  obtain  reliable  quality.  Relationship  between  the 
determined  value  and  the  applied  background  information  used 
to  determine  it  is  referenced  when  the  ground  of  the  production 
design  is  perused  while  it  is  also  used  to  check  whether 
determined  value  contains  any  error.  It  should  be  possible  to 
revise  and  improve  production  design  standard  using  this 
system. 

SUMMARY  AND  CONCLUSIONS 

The  concept  of  virtual  manufacturing  environment  has  been  discussed  and  summarized.  In  order  to 
construct  the  environment,  practical  simulation  systems  for  casting  and  heat  treatment  processes  have  been 
developed  and  introduced.  It  is  found  that  developed  information  management  system  for  material  pre- 
production  processes  based  on  activity  model  is  also  useful  in  practice. 


Fig.6.  Man-machine  interface  for 
activity  model 


REFERENCES 

1.  Y.Nagasaka,  S.Kiguchi,  M.Nachi  and  J.K.Brimacombe  .1989.  Three  Dimensional  Computer 
Simulation  of  Casting  Processes.  AFS  Transactions, 89-1 17,  553-564. 

2.  I.  Ohnaka,  Y.Nagasaka  and  T.Murakami,  1996.  A  Computer  Simulation  System  of  Casting  Processes 
for  Concurrent  Engineering  Approach.  Proc.  of  MCS3-96,  46-51. 

3.  Y.Nagasaka,  J.K.Brimacombe,  E.B.Hawbolt,  I.V.Samarasekera,  B.Hemandez-Morales  and  S.E.Chidiac, 
1993.  Mathematical  Model  of  Phase  Transformations  and  Elastoplastic  Stress  in  the  Water  Spray 
Quenching  of  Steel  Bars.  Matallurgical  Trans.  A  24A,  795-808. 

4.  H.Shichino,  Y.Nagasaka,  T.Takahashi  and  N.Hamasaka.  1992.  Computer  Simulation  For  Heat 
Treatment  of  Gears.  Proc.  Ofthe8  th  Int.  Heat  Treatment  of  Materials  Conf.,  597-600 

5.  Y.Nagasaka  I.Ohnaka  and  T.Murakami.  1997.  An  Intelligent  Casting  CAD  system  based  on  multi-media 
database.  Proc.  of  IPMM97,  490-496 


91 


Simulation  of  Springback  with  the  Draw/Bend  Test 

K.P.  Li,  L.M.  Geng  and  R.H.  Wagoner 

Dept.  Materials  Science  and  Engineering,  The  Ohio  State  University, 
Columbus  OH  43210  USA 


ABSTRACT 

This  paper  summarizes  a  length  analysis  of  springback  of  the  draw/bend  test,  conducted  using  three  sheet 
materials,  several  friction  coefficients,  die  radii,  and  draw-in  restraining  forces.  In  1997,  at  the  last  IPMM 
early  results  were  presented  that  showed  large  discrepancies  between  experiments  and  simulations  for  some 
conditions.  The  simulations  have  been  optimized  since  that  time  and  their  sensitivity  to  a  variety  of  numerical 
parameters  and,  more  recently,  to  the  choice  of  finite  element  and  material  model,  have  been  examined.  The 
finite  element  analysis  (FEA)  of  springback  is  shown  to  be  very  sensitive  to  numerical  parameters,  including 
the  number  of  through-thickness  integration  points,  the  angle  of  contact  per  shell  element,  and  the  tolerances 
for  equilibrium  and  contact.  With  the  help  of  numerical  sensitivity  studies,  guidelines  are  provided  for 
choosing  these  values  effectively.  Good  agreement  between  experimental  and  simulated  (3-D  FEM 
modeling)  springback  has  now  been  obtained  for  a  range  of  process  parameters.  From  this  further  analysis,  it 
is  concluded  that  the  presence  of  3-D  bending  (anticlastic  curvature)  effects  dominate  the  discrepancies,  with 
smaller  errors  caused  by  the  material  model. 


INTRODUCTION 

Springback  describes  the  change  of  shape  of  a  formed  part  upon  removal  from  the  tooling  after  forming. 
Sheet  metals  are  particularly  prone  to  springback  because  of  their  weak  bending  resistance,  especially  for 
materials  with  high  strength-to-modulus  ratios  (aluminum  and  high  strength  steel,  for  example).  These 
materials  are  becoming  more  important  in  the  automotive  industry  to  reduce  vehicle  mass  and  increase  fuel 
efficiency.  Springback  causes  difficulty  in  the  die  design  process  because  the  final  part  shape  does  not 
conform  to  the  tool  geometry.  In  order  to  compensate  for  springback,  die  tryout  is  required  in  the  current 
automotive  die  development  and  construction  process.  Accurate  prediction  of  springback  is  critical  to  reduce 
the  lead  time  of  tool  design,  thereby  saving  time  and  money.  Finite  element  analysis,  which  has  been 
demonstrated  successfully  for  many  complex  industrial  forming  operations,  has  not  yet  shown  the  same 
reliability  or  accuracy  for  sheet  springback  applications  [1-4]. 

In  the  literature  there  are  few  papers  dealing  systematically  with  springback  prediction.  The  role  of  work¬ 
hardening  for  plane-strain  has  been  investigated  [5-7]  and  springback  can  be  reduced  for  tensile  loads  greater 
than  yield  [7].  Other  work  has  focused  on  the  change  of  elastic  modulus  with  plastic  strain  [8]  and  the 
presence  of  the  Bauschinger  Effect  [9-11].  Use  of  isotropic  hardening  (i.e.  without  Bauschinger  Effect)  has 
been  identified  [8,16]  as  a  cause  of  inaccurate  springback  simulation. 

Parametric  studies,  either  experimental  [12-15]  or  numerical  [1-3],  have  been  conducted.  Numerical  effects  in 
treating  bending  accuracy  were  addressed  in  a  limited  manner.  Based  on  a  chordal  deviation  analysis,  Frey 
and  Wenner  [16]  studied  the  mesh  size  limitation  at  the  die  comer  and  proposed  that  the  number  of  contacting 
finite  elements  should  be  at  least  one  per  each  10°  contact  angle.  The  role  of  element  size  and  hardening  law 
in  numerical  problems  with  combined  explicit  loading  and  implicit  unloading  was  studied  [17],  and 
springback  of  the  wrong  sign  when  simulations  are  carried  out  with  large  elements  was  reported.  Stability 
problems  with  explicit  simulations  prior  to  springback  analysis  were  identified  for  particular  program 
combinations  [18], 

The  magnitude  of  springback  depends  on  the  geometry  of  the  deformed  sheet  and  the  distribution  of  the 
bending  moment,  in  the  plane  of  the  sheet  at  the  end  of  stamping  operations.  Most  codes  can  give  an  accurate 
prediction  of  the  geometry,  even  dynamic  explicit  codes  with  numerical  acceleration  schemes.  However,  the 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


92 


correct  prediction  of  stress  distribution  in  the  structure  is  sensitive  to  a  range  of  variables.  The  physical 
sensitivity  depends  on  material  properties,  hardening  laws,  friction  coefficient  and  possibly,  the  unloading 
procedure.  Numerical  sensitivity  depends  on  the  number  of  integration  points  through  the  thickness,  the  type 
of  element  (plane  strain,  plane  stress,  3-D  shell  and  3-D  solid,  etc.),  the  mesh  size,  and  convergence  tolerance. 

The  lack  of  springback  measurements  under  carefully-controlled  conditions  in  the  literature  motivated 
carrying  out  a  series  of  draw/bend  tests  [14,19-20],  A  set  of  central  values  of  process  parameters  was  chosen 
to  represent  typical  automotive  panel  forming.  The  "central"  variable  values  and  ranges  were  as  follows: 

Sheet  thickness:  1mm 

Tool  radius:  9.5mm  (3.2mm  -  25.4mm) 

R/t  (tool  radius/sheet  thickness):  10  (3  -  25) 

Back  force:  0.9  (0.5  -  1.3)  tensile  yield  force 
Friction  condition:  draw  lubricant  (dry,  rolling) 

The  friction  coefficients  were  not  determined  consistently  from  the  test,  but  values  in  the  range  of  0  to  0.25 
were  adopted  from  the  literature  for  comparison  simulations.  These  measurements  allow  evaluation  of  the 
dependence  of  springback  on  process  parameters,  under  conditions  approximating  sheet  forming  practice. 

In  early  studies  [19,21],  simulations  corresponding  to  the  physical  tests  were  carried  out  [22]  to  verify  the 
sensitivity  of  springback  to  these  process  parameters  without  confounding  changes  of  other  aspects.  It  became 
apparent  during  that  work  that  the  simulations  exhibited  considerable  non-physical  scatter  that  made 
meaningful  comparison  problematic.  The  simulations  have  since  been  optimized,  and  their  sensitivity  to  a 
variety  of  numerical  parameters  examined.  The  more  recent  examination  of  sensitivity  to  the  choice  of  finite 
element  and  material  model  will  be  presented  in  this  paper. 

Section  2  describes  the  draw/bend  test  and  material  properties.  Section  3  presents  the  2-D  numerical 
sensitivity  studies  of  springback  prediction,  2-D  being  used  since  3-D  simulations  are  CPU  intensive,  making 
their  numerical  sensitivity  study  prohibitive.  The  conclusion  is  so  general  for  beam/shell  type  elements  that  it 
can  be  extended  to  3-D  shell  analysis  without  concern.  Section  4  addresses  the  discrepancies  between 
experiments  and  2-D  simulations,  and  shows  good  springback  prediction  from  optimized  3-D  models. 
However  special  care  for  choosing  shell  (or  solid)  elements  must  be  taken  if  the  R/t  ratio  (Ratio  of 
curvature/sheet  thickness)  is  too  small  (i.e.,  ~5).  Section  5  shows  the  influence  of  the  Bauschinger  Effect  on 
springback  prediction  by  3-D  shell  elements  with  the  help  of  the  bending/unbending  and  tensile/compression 
tests.  A  brief  discussion  is  given  in  Section  6  while  Section  7  presents  the  conclusions. 

THE  DRAW/BEND  TEST 

The  draw/bend  test  [14,19-20],  Figure  1,  consists  of  two  hydraulic  actuators  oriented  at  a  9<T  angle,  and  a 
fixed  or  rolling  cylinder  to  simulate  a  tooling  radius  over  which  the  strip  sample  (50mm  wide)  is  drawn.  The 
upper  actuator  is  programmed  to  provide  a  constant  restraining  force,  Fj,.  The  lower  actuator  is  set  to  displace 
at  constant  speed,  v,  thus  drawing,  bending/unbending,  and  possibly,  stretching  the  sample  over  the  cylinder. 
When  the  test  is  over,  the  sample  is  allowed  to  springback  by  removing  it  from  the  grips. 


Region  1 

a  -  t  — i _ u  ^ 

/ 

»  i - ,  c 

A 

'  7\ 

1 

R 

Region  2 

— 

Region  3  1 

Ax=  127  mm  j 

1 

/ 

v  =  40mm/s  f 

1 

1 

Region  4  / 

r 

i 

A0 


Fig.l.  Schematic  of  the  draw/bend  test  (left)  and  unloaded  specimen  shape  after  testing  (right). 


93 


This  test  achieves  steady-state  after  a  small  initial  displacement,  and  is  mechanically  similar  to  the  die 
sidewall  region  of  a  stamping  operation.  Three  materials  were  tested:  DQSK  (draw-quality,  silicon-killed) 
steel,  a  common  material  for  automotive  stamping;  HSLA  (high-strength,  low-alloy)  steel;  and  6022-T4 
aluminum.  The  last  two  materials  are  of  interest  for  weight  savings.  For  reasons  of  space,  only  the  results 
with  6022-T4  and  DQSK  will  be  studied  in  the  current  presentation. 

A  series  of  tensile  tests  were  used  to  obtain  the  uniaxial  strain-hardening  material  properties  [19,20],  as 
follows: 

6022-T4:  Young's  Modulus,  E  =  69  GPA;  Poisson's  Ratio,  v=0.3;  The  thickness  of  the  sheet  is  0.9mm.  The 
true  stress  and  true  strain  curve  is: 

o  =A[1  -Bexp(-Ce)-Dexp(-Fe)]  £  >0.002 

where  A  =  389MPa,  B  =  0.566,  C  =  8.44,  D  =  1.20,  F  =  1 120. 

DQSK:  Young's  Modulus,  E=  212  GPA;  Poisson's  Ratio,  v=0.3;  The  thickness  of  the  sheet  is  1.5mm.  The 
true  stress  and  true  strain  curve  is: 

0  =  A(B+ e)c -Dexp(-Fe)  £>0.0035 

where  A  =  540  MPa,  B  =  0.00965,  C  =  0.274,  D  =  150  MPa,  F  =  1700. 

2-D  SIMULATIONS 

Because  the  ratio  of  strip  width  to  thickness  (w/t)  for  the  draw/bend  specimens  is  about  50  for  6022 -T4  and 
30  for  DQSK,  the  plane  strain  model  was  expected  to  be  appropriate.  Therefore  our  original  analyses 
concentrated  on  2-D  modeling,  carried  out  using  ABAQUS  [23]  and  SHEET-S  [21,22,24,25].  In  ABAQUS, 
there  is  no  plane  strain  model  for  2-D  beam  elements.  Hence,  new  developments  were  required  in  our 
SHEET-S  program. 

SHEET-S  developments 

With  this  in  mind,  SHEET-S  [21,24,25]  was  modified  to  simulate  springback.  Two  co-rotational  2D  beam 
elements  [26]  were  implemented  with  an  elasto-plastic  formulation.  These  elements  have  two  nodes  with 
three  degrees  of  freedom  per  node.  One  is  based  on  the  Bernoulli  formulation  (without  shear)  and  the  other  is 
formulated  with  the  Timoshenko  theory  (with  shear  correction).  Only  results  obtained  with  the  Timoshenko 
element  are  presented  here,  although  it  should  be  noted  that  there  was  no  significant  difference  in  the  results 
for  most  simulations,  even  for  R/t  as  small  as  3.  All  finite  element  simulations  in  this  section  were  performed 
for  the  6022-T4  using  an  isotropic  hardening  law.  Step  size  was  found  to  have  little  or  no  effect  on  simulated 
springback  angle. 

State  of  stress  and  strain 

Based  on  the  geometiy  (w/t  of  50  for  6022-T4),  plane  strain  should  give  a  good  approximation  of  the  strain 
state  in  the  draw/bend  test.  However,  the  plane  stress  model  always  produced  results  in  closer  agreement 
with  experiments  (Fig.2),  especially  for  low  back  forces.  The  difference  between  springback  angles  of  these 
two  models  is  almost  constant  regardless  of  the  applied  back  force  (-HE- 15°).  In  the  remaining  simulations 
for  the  numerical  sensitivity  studies,  the  plane  stress  model  was  chosen,  although  the  results  are  similar  for 
either  condition. 

The  principal  effect  (and  reason  for  better  plane  stress  results)  lies  in  the  occurrence  of  the  anticlastic 
curvature  which  arises  during  forming,  and,  in  some  cases,  persists  after  springback.  The  presence  of  this 
secondary  curvature  greatly  increases  the  section  moment  and  reduces  principal  springback  accordingly.  This 
will  be  discussed  in  Section  4. 


94 


Normalized  Back  Force  (F  ) 


Fig.  2.  A8  for  plane  stress  vs.  strain  models. 


Unloading  procedure 

In  order  to  simplify  the  problem,  springback  is  often  considered  to  be  a  purely  elastic  unloading  (for  example, 
see  [27-28]  and  many  others).  However,  during  springback,  plastic  deformation  may  occur  (Fig.3),  and  pure 
elastic  unloading  is  not  appropriate.  Purely  elastic  unloading  overstates  the  springback  angle  relative  to  the 
elasto-plastic  case.  The  amount  of  overestimation  depends  on  applied  back  force,  although  the  maximum 
difference  can  exceed  1/3  of  the  total  (10°  for  a  A0  equal  to  25°,  Fig.  3). 


Normalized  Back  Force  (F  )  Normalized  Z-coordinate 

D 

Fig.  3.  The  influence  of  unloading:  elastic  vs.  Fig.  4.  Stress  distribution  along  the  thickness  of 

elasto-plastic  unloading  cases.  sheet  before  and  after  springback  for 

unloading  case  2. 


Figure  4  shows  through-thickness  stress  distribution  for  elastic  versus  elasto-plastic  unloading.  The 
drawn/bent/unbent  section  (Region  3)  exhibits  the  elasto-plastic  unloading  that  occurs  at  the  outer  fiber  of  the 
sheet  (normalized  z  coordinate  is  equal  to  0.5). 


- After  Springback 

Fig.  5.  Unloading  schemes. 


Element  Number 

Fig.  6.  Moment  distribution  following  the  first 
unloading  step  for  Schemes  1  and  2. 


95 


Since  elasto-plastic  unloading  is  path  dependent,  three  unloading  Schemes”  were  compared  for  consistency 
of  results.  Schemes  1  and  2  consist  of  two  steps  while  Scheme  3  is  accomplished  in  one  step.  The  specimen 
shapes  for  the  steps  in  these  schemes  are  shown  in  Figure  5. 


For  Scheme  1,  the  stretching  force  at  the  end  node  of  the  sheet  is  released  first  with  the  tool  contact  in  place. 
The  contact  between  the  sheet  and  the  cylinder  disappears  at  the  end  of  this  step.  A  nearly  constant  moment 
distribution  is  obtained  at  this  moment  (Fig.6).  The  second  step  consists  of  releasing  the  end  node  by  reducing 
the  nodal  forces  to  zero.  For  Scheme  2,  the  contact  is  removed  by  replacing  the  contact  boundary  condition 
by  equivalent  nodal  forces  and  reducing  these  forces  to  zero.  During  this  step,  there  is  no  real  contact 
treatment:  the  process  is  a  purely  mechanical  one.  The  moment  distribution  along  the  sheet  at  the  end  of  this 
step  is  quite  different  (Fig.6).  In  the  final  unloading,  the  end  node  is  freed  by  reducing  the  nodal  forces  to 
zero.  For  Scheme  3,  the  contact  equivalent  forces  and  the  nodal  forces  are  proportionally  reduced  to  zero, 
with  no  contact  treatment.  For  these  three  unloading  paths,  Scheme  1  is  very  CPU  intensive  because  of  the 
concurrent  contact  treatment,  whereas  Scheme  2  is  more  efficient  than  Scheme  3.  A0  for  these  three 
unloading  cases  is  26.42°,  26.45°,  26.26°  for  Schemes  1,  2,  and  3  respectively.  As  will  be  shown  later,  these 
differences,  in  the  range  of  0.2°,  are  inconsequential  as  compared  with  numerical  scatter.  Thus,  the  choice  of 
loading  scheme  is  immaterial,  in  spite  of  the  elastic-plastic  path. 


Numerical  Sensitivity 

Early  simulations  of  the  draw/bend  test  revealed  that  careful  attention  to  numerical  parameters  of  the 
simulation  was  required.  Values  for  these  parameters  that  have  been  established  for  forming  simulations  were 
no  longer  suitable  for  springback  analysis,  and  unacceptable  scatter  was  obtained.  The  source  of  numerical 
errors  come  from  the  number  of  integration  points  through  the  thickness,  the  number  of  elements  contacting 
with  the  tools,  convergence  tolerances,  material  hardening  laws  and  2-D  and  3-D  modeling.  In  this  section, 
only  the  effect  from  the  number  of  integration  points,  the  number  of  elements  in  contact  with  die  radius  and 
convergence  tolerance  will  be  discussed.  Other  parameters,  for  example,  material  hardening  law  and  3-D 
modeling,  will  be  presented  later  in  this  paper. 


o> 

d 


</> 

© 

£ 

O) 

© 

■O 


Fig.  7.  Effect  of  the  number  of  elements  (Nel) 
and  integration  points  (Nu>). 


Fig.  8.  Effect  of  the  number  of  integration  points 
on  simulated  springback  angle. 


Figure  7  illustrates  the  dramatic  combined  sensitivity  of  springback  simulation  to  numerical  aspects.  To 
represent  a  typical  forming  simulation,  a  mesh  of  150  equal-size  elements  (4  in  contact  with  the  tool)  and  5 
integration  points  through  the  thickness  was  employed,  compared  with  600  non-unequal-size  elements  (50  in 
contact  with  the  tool)  and  5 1  integration  points.  The  springback  predictions  given  by  the  150  element  mesh 
are  clearly  incorrect,  and  for  larger  back  forces  have  the  wrong  sign. 


In  order  to  isolate  the  effects  of  these  various  parameters,  we  carried  out  a  sensitivity  study  using  values  that 
we  found  to  be  sufficient  (600  elements,  5 1  integration  points,  and  contact/equilibrium  tolerance  of  one  part 

in  10'4),  then  varied  each  parameter  independently. 


96 


Number  of  integration  points 

Most  shell  elements  require  numerical  integration  of  stress  through  the  thickness  to  obtain  the  internal  force 
vector,  moments  and  tensile  forces.  This  brings  numerical  error  in  the  results.  For  sheet  forming  processes,  5 
integration  points  are  commonly  specified  [23,29]  and  good  results  are  usually  reported.  However,  our 
results  show  that  springback  is  very  sensitive  to  the  number  of  integration  points,  and  many  more,  by  nearly 
an  order  of  magnitude,  are  required  because  of  the  need  to  locate  precisely  the  elastic  core  in  the  sheet. 


Figure  8  shows  the  effect  of  the  number  of  integration  point  through  the  thickness  (1%)  for  R/t=10,  and 
Fb=0.5  or  0.9  [21].  For  a  given  R/t  and  Fb>  if  a  percentage  error  less  than,  for  example,  1%,  is  required,  Nu> 
depends  strongly  on  the  overall  springback  angle.  For  Fb=0.5  (A6=55°),  NIP>21  and  for  Fb=0.9  (A0=25°), 
NIP>35.  If  the  error  of  springback  angles  is  required  to  be  less  than  1  degree,  then  NIP>15  is  sufficient  for  both 
cases.  For  a  total  A0  of  about  25°  (R/t=10,  Fb=0.9),  the  numerical  error  caused  by  an  insufficient  number  of 
integration  points  can  be  as  high  as  5°  (by  comparing  5  to  151  integration  points),  or  about  1/5  (20%)  of  the 
total  A0.  An  even  larger  scatter  (about  7°)  occurs  as  the  number  of  integration  points  is  varied  in  the 
normally-recommended  range  of  5  to  9. 

Number  of  elements 

The  extreme  case  of  the  influence  of  number  of  elements  on  springback  prediction  is  shown  in  Figure  7, 
which  used  a  mesh  of  150  equal-size  elements,  NIP=5  for  R/t=10  and  Fb=0.9.  The  wrong  springback 
prediction  happened  again  with  a  mesh  of  300  equal-size  elements  for  R/t=4.  This  wrong-sign  springback 
prediction,  which  in  this  case  is  an  example  of  extreme  numerical  scatter,  has  also  been  observed  by  others 
[17,26].  In  both  cases,  the  number  of  nodes  in  contact  with  the  tooling  (cylinder),  Ncn  is  less  than  5  (about  4 
nodes  for  R/t=  10  and  about  3  nodes  for  R/t=4). 


Table  1.  Typical  mesh  parameters  for  R/t=10. 


Part  2 

L2=  140mm 

Nel 

N, 

h 

A*/N„ 

150 

106 

1.3 

7.8° 

300 

221 

0.63 

3.8° 

600 

452 

0.31 

1.9° 

900 

684 

0.2 

1.2° 

1200 

914 

0.15 

0.9° 

2400 

1838 

0.08 

0.5° 

In  order  to  examine  numerical  error  for  a  variety  of  simulations  more  systematically,  we  defined  a  "reference" 
simulation  result  for  a  given  physical  problem,  and  defined  error  as  deviation  of  simulated  A0  from  this  result. 
The  error  may  be  expressed  more  compactly  either  in  terms  of  absolute  angle,  i[k0 1  or  absolute  percent, 
^0/AO  refx  100,  without  specifying  whether  the  simulation  over-  or  under-  estimates  the  reference  result. 

To  study  of  the  influence  of  element  size,  the  unequal-size  meshes  listed  in  Table  1  were  constructed  (here 
only  the  mesh  parameters  for  the  part  2  are  listed,  for  more  details  of  other  parts,  see  [21]).  In  the  regions  1 
and  4  (originally  flat  and  remains  so  after  springback),  coarse  meshes  are  used.  The  turning  angles  and 
lengths  of  Part  2,  including  Regions  2  and  3  (originally  flat  and  makes  contact  during  the  draw)  refer  to  the 
standard  tooling  radius  of  9.5mm.  For  other  cases,  the  turning  angles  can  be  scaled  from  the  standard  case. 
For  the  region  initially  in  contact  with  the  die,  the  number  of  elements  is  fixed  at  10  for  all  cases.  The  length 
of  each  element  is  thus  1 .5mm  for  R/t=10,  and  the  lengths  for  other  tool  radii  can  be  obtained  proportionally. 

Figure  9  illustrates  the  combined  effect  of  the  number  of  elements  and  the  number  of  integration  points 
through  the  thickness  for  R/t=10,  Fb=0.9  and  R/t=28,  Fb=0.9,  respectively.  In  all  cases,  a  900-element  mesh 
with  Nn>=51  was  used  as  reference,  with  the  choice  of  R/t=  10  or  28  distinguishing  the  two  reference 
calculations.  From  Figure  9,  the  following  remarks  can  be  make: 

a)  the  errors  settle  down  to  variations  of  less  than  1°  (for  a  fixed  number  of  elements)  with  Nn>>25, 
approximately  independent  of  springback  angle. 


97 


b)  It  is  obviously  difficulty  or  even  impossible  to  achieve  an  error  under  1%  tolerance,  especially  for 
small  springback  cases. 

c)  In  terms  of  the  number  of  elements  required,  all  the  meshes  are  able  to  achieve  1°  accuracy  for  the 
cases  presented,  whereas  for  percentage  error,  the  results  vary  widely.  The  high  springback  case 
(Figure  9a,  A0ref=24.7°)  requires  600  elements  to  achieve  1%  error,  whereas  the  low  springback  case 
(Figure  9b,  A0ref=3°)  cannot  reduce  error  to  1%  even  with  600  element  mesh.  (Thus,  it  may  be  argued 
that  a  mesh  of  more  than  900  elements  should  be  used  as  the  reference  here). 


(a)  (b) 

Fig.  9.  Dependence  of  numerical  simulation  error  (from  reference  calculation)  on  simulation  parameters. 


When  there  are  enough  elements  in  contact  with  the  tool,  (approximately  10  elements  in  the  worst  case  shown 
here),  |  S\0|  is  always  less  than  1  degree.  If  we  only  require  that  the  error  inA0  be  less  than  1°,  then  for 
R7t=10,  the  mesh  with  150  elements  is  sufficient.  However,  for  tighter  angular  tolerance  specified  as  a 
percentage  of  total  springback,  the  small  springback  cases  may  require  more  than  900  elements,  or  many 
times  more  contact  elements.  Clearly,  it  is  simpler  and  more  consistent  to  specify  error  in  terms  of  angle 
rather  than  percentage. 


Numerical  tolerances 

In  SHEET  programs,  each  step  solution  is  converged  if  both  equilibrium  and  contact  conditions  are  satisfied. 
Equilibrium  is  achieved  if  the  total  unbalanced  force  norm  over  the  total  force  norm,  and  the  iterative 
incremental  displacement  norm  over  the  incremental  displacement  norm,  are  below  the  prescribed  tolerances. 


|Fin,-Fj| 

|Fj 


<TC 


and 


||5Aun|| 

|Au|  "  " 


1. 


where  1 1  denotes  the  Euclidean  norm  of  a  vector  and  TF,  Tu  are  the  tolerances. 

The  contact  conditions  are  satisfied  if,  for  each  contact  node,  both  the  contact  force  over  the  force  norm,  and 
the  penetration  distance  (dpenetration),  defined  by  the  penetration  distance  along  the  mesh  normal  [23,24],  over  a 
normalized  norm  (either  the  element  length  or  the  nodal  displacement  norm),  are  below  the  imposed 
tolerances. 


>  -Tc,  and  “r™*:  ^  Tc, 

HU  CF  ||DnoJ 


2. 


where  TCf  and  TCd  are  the  tolerances  for  contact  force  and  penetration  distance,  respectively. 


In  the  simulations  presented  here,  identical  values  for  TF,  Tu,  TCF,  and  TCd  are  enforced,  with  typical  results 
shown  in  Figure  10.  For  convergence  tolerances  ranging  from  10'3  to  10'8,  the  %error  is  less  than  1%,  which 
is  negligible  compared  to  the  error  induced  by  insufficient  N1P  and  NEl-  The  value  adopted  in  the  simulations 
presented  is  10'4,  chosen  to  avoid  any  significant  error  caused  by  tolerance  in  convergence  or  contact. 


98 


Fig.  10.  Influence  of  tolerances  on  A0  and  %  error. 


Sensitivity  To  Process  Parameters 

In  the  sensitivity  studies  of  the  process  parameters,  the  following  base  values  were  used:  600  unequal-size 
elements,  51  integration  points  through  the  thickness,  and  I  O'4  tolerances.  These  values  were  found  to  be 
sufficient  for  numerical  error  limited  to  less  than  1°.  In  order  to  reduce  springback,  the  most  common  and 
effective  strategy  in  practice  is  to  increase  tensile  stress  in  the  sheet.  In  sheet  forming  applications,  the  typical 
applied  Fb  is  in  the  range  of  0.5- 1 . 1  of  the  tensile  yield  force.  A  normalized  back  force  of  Fb  =  1.1  was  found 
to  be  the  upper  limit  to  avoid  strain  localization  with  6022 -T4  aluminum. 


Figure  11  shows  the  predicted  springback  angles  versus  back  restraining  force,  as  compared  to  the 
experimental  data.  Springback  reduces  steadily  with  the  increase  of  restraining  force.  But  the  experimental 
data  show  a  rapid  decrease  as  the  normalized  back  force  approaches  0.9,  while  the  numerical  results  are 
nearly  linear.  This  discrepancy  is  because  of  the  occurrence  of  anticlastic  curvature  developed  during 
unbending  and  persisting  after  springback,  which  will  be  presented  in  3-D  simulation  thereafter. 

Friction 

The  role  of  friction  on  springback  was  investigated  under  three  experimental  conditions:  1)  low  friction  - 
lubricated  /  free  rolling  die  cylinder;  2)  medium  friction  -  lubricated  fixed  die;  3)  high  friction  -  unlubricated 
fixed  die.  Parco  404  [30],  a  standard  industrial  forming  lubricant,  was  used  for  the  lubricated  cases  [19,22]. 

A  common  method  to  determine  the  friction  coefficient  is  called  the  rope  formula,  which  uses  the  front  and 
back  force  values  measured  during  the  tests  [31].  But  values  obtained  in  this  way  are  very  scattered.  Instead, 
in  the  simulations,  the  friction  coefficients  were  estimated  by  past  experience  as  follows:  p=0  for  the 
lubricated  free  roller,  [1=0.15  for  the  lubricated  case,  and  p.=0.25  for  the  dry  case.  (Note:  these  estimates  only 
affect  the  placement  of  the  experimental  results,  and  small  errors  are  not  very  significant  in  springback 
determination.) 


of  springback  angle  to  friction  coefficient. 


99 


Figure  12  illustrates  the  effect  of  friction  coefficients  for  simulations  and  experimental  data.  Springback 
decreases  as  friction  increases,  and  the  role  of  friction  increases  considerably  as  back  force  is  increased. 

Normalized  bending  ratio  (R/t) 

The  influence  of  the  R/t  ratio  on  springback  has  been  studied  by  many  authors  [9,11,26,31,32],  who  have 
been  nearly  equally  divided  about  the  direction  of  the  effect.  Four  R/t  cases  were  simulated  for  a  fixed, 
normalized  back  force  (Fb=0.9)  with  lubricated  tooling  (p=0.15). 

In  a  typical  sheet  forming  range,  R/t  has  a  modest  effect  on  springback  for  this  combined  stretch  and  draw 
condition  (Figure  13).  The  direction  of  the  R/t  effect  is  contrary  to  some  reports  in  the  literature  [12,27,32] 
and  consistent  with  others  [10,33],  In  the  experiments  and  simulations  with  steel  cases  described  in  Section 
4,  the  effect  for  R/t  was  seen  to  reverse  below  a  critical  value  near  R/t  =  5. 

3-D  SIMULATIONS 

In  the  2-D  simulations  (Figures  11-12),  the  predicted  springback  decreases  nearly  linearly  with  an  increase  of 
restraining  force,  but  the  experimental  data  show  a  rapid  decrease  as  the  normalized  back  force  approaches 
0.9.  The  error  for  Fb=0.9,  (A0=6°  experiment  vs.  A0=26°  simulation)  is  nearly  400%.  In  order  to  understand 
this  discrepancy,  3-D  shell  and  solid  elements  were  employed.  4-node  shell  elements  were  used  with  a  mesh 
of  4x300  elements  (4  elements  in  the  width,  300  unequal-size  elements  in  the  length),  with  51  through¬ 
thickness  integration  points;  and  8-node  solid  elements  (C3D8R  [23])  with  a  mesh  of  4x300x6  elements  (4 
elements  in  the  width,  1 50  unequal-size  elements  in  the  length  and  6  elements  in  the  thickness). 


Fig.  14.  Effect  of  restraining  force  on  A0  for  various  Fig.  15.  The  anticlastic  curvature  of  Regions  1  to  4. 
FEA  models. 

Figure  14  shows  the  springback  angles  versus  back  restraining  forces  for  R/t=10.  It  is  evident  that  the  results 
given  by  shell  elements  agree  with  experimental  data,  while  the  answers  given  from  the  8-node  solid  are  very 
similar  to  the  results  obtained  by  the  plane  strain  model. 

The  origin  of  the  differences  between  2-D  simulations  and  the  experiments  for  normalized  back  forces  greater 
than  0.9  can  be  understood  in  terms  of  anticlastic  curvature,  the  secondary  curvature  that  occurs  normal  to 
principal  bending  in  thin  sheets.  In  the  elastic  case,  it  is  a  product  of  Poisson  contraction. 

Although  anticlastic  curvature  is  usually  described  as  a  result  of  bending  under  plane  stress  [34]  (e.g., 
w/t=10).  Figure  15  shows  that  cross-sections  in  Region  2  (Fig.l),  the  bending  region,  along  with  Regions  1 
and  4,  remain  nearly  flat.  Region  2  is  presumably  constrained  by  the  tensile  stress  and  high  contact  pressure 
with  the  tooling.  Region  3,  the  unbending  region  of  the  sheet  or  "sidewall  curl"  area,  exhibits  anticlastic 
curvature  approaching  the  principal  curvature  of  Region  3  (after  unbending).  The  unloaded  Region  3 
principal  curvature  is  0.009  mm'1  (0.23  in'1)  for  the  Fb=0.5  case  and  0.002  mm'1  (0.05  in'1)  for  the  Fb=0.9 
case. 


100 


Normalized  Back  Force,  F  Normalized  Back  Force,  F 

b  b 

(a)  (b) 

Fig.  16.  A0  vs.  Fb  for  various  FEA  models  with  R/t=6.35  (to=1.5mm) 


Figure  16a  shows  that  the  anticlastic  curvature  in  Region  3  is  high  and  nearly  insensitive  to  back  force  during 
loading,  but  after  unloading  it  takes  a  sudden  jump  at  Fb  greater  than  or  equal  to  0.9.  This  jump  matches 
closely  the  experimentally  measured  values.  Furthermore,  this  jump  in  anticlastic  curvature  is  matched 
precisely  by  a  sudden  decrease  in  springback  angle  (Fig.  16b),  for  3-D  simulations  and  experiments.  Thus,  the 
origin  of  the  sudden  decrease  of  springback  angle  for  Fb  above  0.9  lies  with  the  persistence  of  anticlastic 
curvature  following  unbending.  The  persistence  of  this  curvature  increases  the  effective  section  modulus  for 
principal  bending  and,  for  a  fixed  applied  moment,  reduces  the  springback  angle  correspondingly. 


Note  that  in  Figure  14,  the  3-D  solid  simulation  did  not  show  the  presence  of  persistent  anticlastic  curvature, 
nor  the  accompanying  sudden  change  of  springback  angle.  The  results  were  essentially  the  same  as  for  plane 
strain  simulation.  Since  8-node  solid  elements,  which  have  linear  shape  functions,  are  very  stiff  in  bending 
modes,  the  simulations  were  repeated  with  20-node  nonlinear  3-D  solid  elements  (C3D20  [23]).  Because  of 
the  large  CPU  times  involved  (15-20  hours),  a  mesh  of  2x150x3  (2  in  the  width,  150  in  the  length,  and  3  in 
the  thickness)  was  used.  The  results,  also  shown  in  Figure  1 6b,  reproduce  the  main  features  of  the  3-D  shell 
simulation.  Subsequent  results  (Figures  17-20)  are  shown  for  both  kinds  of  elements  and  meshes. 


^  Normalized  Back  Force,  F 

b 

Fig.  17.  A0  vs.  R/t  for  various  FEA  models  with  Fb=0.9.  Fig.  18.  A0  vs.  Fb  for  various  FEA  models  with 


Fig.  19.  Dependence  of  anticlastic  curvature  on  Fig.  20.  A0  vs.  R/t  for  various  FEA  models  with  Fb=0.9. 


back  force. 


101 


Figures  17-20  show  that  good  agreement  between  experiments  and  simulations  with  both  elements  (3-D  shell, 
3-D/20-node  solid)  is  obtained  for  the  larger  R/t  ratios  tested.  However,  for  the  DQSK  materia!  (Fig.20), 
there  is  a  clear  departure  of  experimental  results  from  the  shell  simulations  at  R/t  less  than  6.  As  R/t  is 
decreased  below  6,  the  springback  angle  decreases  rapidly,  contrary  to  the  shell  simulations.  The  3-D  solid 
elements,  however,  reproduce  this  change  well.  The  difference  presumably  lies  in  the  basic  shell  element 
theory  upon  which  the  shell  elements  are  based.  That  is,  the  bending  strain  is  assumed  to  be  distributed 
linearly  through  the  thickness. 

BAUSCHINGER  EFFECT 

As  shown  in  Figures  14  and  16,  the  simulated  springback  angles  are  very  close  to  the  experimental  ones  for 
high  back  forces,  while  more  difference  is  apparent  for  low  back  restraining  forces.  A  possible  source  of  error 
in  the  simulation  lies  in  the  use  of  the  anisotropic  hardening  law,  which  may  be  inadequate  to  approximate  the 
real  material  strain-hardening  behavior  for  some  alloys.  In  the  draw/bend  test  (and  in  real  sheet  forming), 
reverse  loading  occurs  because  bending  and  unbending  occurs  when  the  sheet  strip  is  pulled  over  the  radius 
and  straightened  back  after  leaving  the  radius.  For  materials  which  exhibit  a  Bauschinger  effect,  yield  stress 
in  reverse  loading  is  usually  lower  than  for  continuous  loading,  while  the  isotropic  hardening  model  offers  a 
reasonable  fit  to  material  hardening  for  proportional  loading  path,  but  may  not  account  for  this  effect. 
Because  springback  is  very  sensitive  to  the  moment/stress  at  the  end  of  the  forming,  for  material  that  exhibits 
a  Bauschinger  effect,  using  an  anisotropic  hardening  law  instead  of  the  isotropic  one  could  give  more  realistic 
stress/moment  distribution  for  bending  and  unbending  processes,  improving  the  accuracy  of  the  prediction. 

Two  kinematic  type  of  hardening  laws  were  implemented  into  finite  element  code  to  account  for  Bauschinger 
effect.  One  is  the  mixed  hardening  law  presented  in  [26],  in  which  a  scalar  m  is  defined  to  divide  total  plastic 
strain  into  isotropic  part  and  kinematic  part.  This  hardening  law  is  very  well  in  reproducing  the  lower  reverse 
yield  stress  and  permanent  softening,  however,  it  can  not  show  the  smooth  elastic-plastic  transition.  On  the 
other  hand,  nonlinear-kinematic  hardening  law  proposed  by  Chaboche  [35]  does  show  this  gradual  transition 
on  the  reverse  loading,  however,  it  needs  advanced  modification  to  reproduce  the  permanent  softening.  The 
simulated  springback  angles  with  these  two  kinematic  types  of  hardening  laws  are  veiy  material  parameter 
sensitive.  Depending  on  the  choice  of  material  parameters,  the  springback  angle  could  vary  significantly. 

To  determine  the  material  parameters  needed  for  describing  the  Bauschinger  effect,  direct  in-plane  tension- 
compression  tests  were  performed  [36],  In  this  test,  the  sheet  specimen  is  sandwiched  between  the  two  sets  of 
forks.  Clamping  force  is  applied  on  the  sides  of  the  fork  to  prevent  the  buckling  of  the  sheet  sample  during 
compression,  and  axial  force  is  applied  using  a  standard  tensile  testing  machine.  The  stress-strain  data 
obtained  directly  from  the  device  were  corrected  to  account  for  the  friction  between  the  sheet  specimen  and 
the  fork  surface,  and  for  the  biaxial  stress  state  caused  by  the  clamping  force  normal  to  the  sheet  surface. 
Preliminary  experiments  were  carried  on  with  three  stages:  tension-compression-tension,  or  compression- 
tension-compression.  The  results  show  the  lower  yield  stress  and  permanent  softening  on  the  reversed 
loading  curve  for  PNGV  material  HSLA,  AL6022-T4  and  DQSK  (Fig.21).  The  mixed  hardening  model  and 
nonlinear-kinematic  model  were  used  to  fit  the  experimental  tension-compression  curve,  material  parameters 
were  determined  by  trial  and  error.  The  results  (Fig.22)  show  that  material  parameters  (such  as  mixed  factor 
m)  have  to  be  functions  of  effective  strain  in  order  to  reproduce  the  experimental  result  reasonably  well. 


Fig.  21.  True  stress  vs.  true  strain  curve  for  tensile/  Fig.  22.  Curve  of  mixed  hardening  control 
compression  test:  experiment  and  simulation.  parameter  m  vs.  plastic  strain. 


102 


With  the  experimental  data  available  from  the  tension/compression  tests,  the  mixed  hardening  law  was 
modeled  to  simulate  the  draw/bend  test  of  6022-T4  (Fig.  23).  For  low  back  restraining  forces,  the  simulated 
springback  angles  are  considerably  closer  to  the  experimental  data,  while  for  high  back  restraining  forces, 
they  move  slightly  away  from  the  test  results.  While  definitive  conclusions  are  difficult  at  this  preliminary 
stage,  it  appears  the  remaining  discrepancies  are  attributable  to  material  law  complexity  under  reverse 
loading. 


Fig.  23.  Effect  of  hardening  law  on  the  simulated  springback  angle 


DISCUSSION 

The  numerical  sensitivity  studies  indicate  that  stable,  reproducible  numerical  results  can  be  obtained  if  the 
mesh  size,  contact  and  equilibrium  tolerance,  and  through-thickness  integration  scheme  are  chosen  properly. 
Furthermore,  the  variation  of  springback  angle  with  typical  physical  process  variables  (friction  coefficient, 
bending  ratio,  and  tensile  force)  is  consistent  between  simulations  and  measurements.  Thus  it  appears  that, 
with  care,  FEA  can  be  used  to  predict  systematic  springback  effects  consistently  and  accurately. 

With  this  in  mind,  it  is  necessary  to  note  that  if  the  element  isn't  chosen  properly,  the  simulated  springback 
angles  can  differ  greatly  from  measured  ones  (the  extreme  case  is  shown  in  Figure  2-3, 7,1 1-12,  14,  and  20). 
Anticlastic  curvature  appears  during  unbending  of  the  sheet  after  drawing  over  the  die  radius  even  when  w/t  is 
over  50.  For  higher  back  forces  (Fb>0.9),  this  secondary  curvature  persists  after  unloading  and  the  increased 
section  moment  greatly  reduces  springback.  3-D  shell  elements  or  3-D  nonlinear  solid  elements  are  needed  to 
simulate  this  effect. 

For  small  R/t  values  (less  than  about  5),  shell  elements  are  no  longer  valid  for  springback  analysis  and  solid 
elements  are  required.  Mixed  solid/shell  elements  may  be  a  promising  approach  to  analyse  springback  [37]. 

CONCLUSIONS 

Simulations  and  experiments  of  draw-bending  of  6022-T4  aluminum  over  a  typical  range  of  process  variables 
were  carried  out.  The  following  conclusions  can  be  made. 

In  terms  of  numerical  sensitivity: 

•  Whereas  typical  forming  simulations  are  acceptably  accurate  with  5  to  9  points,  springback  analysis 
requires  up  to  51  points,  and  more  typically  25  points. 

•  A  sufficient  number  of  contact  nodes  is  also  critical,  approximately  one  node  per  5-1  O'3  of  turn  angle. 
When  nodes  are  separated  by  20°  of  turn  angle  or  more,  simulated  springback  can  occur  in  the 
opposite  direction. 

•  Convergence  and  contact  tolerances  must  be  enforced  carefully,  but  values  of  one  part  in  10,000, 
which  are  typical  of  implicit  forming  simulations,  are  sufficient. 

•  3-D  shell  and  non-linear  solid  elements  are  preferred  in  springback  prediction  even  for  large  w/t  ratio 
(in  our  case  greater  than  50)  because  of  the  presence  of  the  anticlastic  curvature. 

•  For  small  R/t  ratio  (about  5),  only  3-D  non-linear  solid  elements  can  accurately  predict  springback. 

In  terms  of  physical  sensitivity: 

•  Springback  decreases  with  increasing  tensile  stress,  up  to  about  90%,  or  50°.  This  effect  dominates. 


103 


•  Springback  decreases  significantly  (3-7°,  approximately  25%)  with  increasing  R/t  ratio,  for  R/t  greater 
than  approximately  3. 

•  Friction  has  a  modest  but  measurable  effect  on  springback  in  a  typical  industrial  range  (3-7°,  or  5-20%), 
larger  for  low  tensile  stress. 

ACKNOWLEDGMENTS 

The  financial  support  of  a  PNGV  subcontract  via  NIST/ATP  and  the  Center  for  Advanced  Materials  and 

Manufacturing  of  Automotive  Components  (CAMMAC)  is  gratefully  acknowledged.  Experimental  data  and 

early  simulations  for  draw/bend  tests  were  provided  by  William  Carden,  D.  K.  Matlock  and  Wendy  P. 

Carden.  Experimental  data  for  Bauschinger  Effect  were  provided  by  Y.  Shen  and  V.  Balakrishnan. 

Computer  time  was  provided  by  the  Ohio  Supercomputer  Center  (PAS  080). 

REFERENCES 

1 .  Proceedings  of  the  2nd  International  Conference  NUMISHEET'93  -  Numerical  Simulation  of  3-D  Sheet 
Metal  Forming  Processes,  1993. 

2.  Proceedings  of  the  5th  International  Conference  on  Numerical  Methods  in  Industrial  Forming  Processes 
NUMIFOR'  95  -  Simulation  of  Materials  Processing:  Theory,  Methods  and  Applications,  1995. 

3.  Proceedings  of  the  3rd  International  Conference  NUMISHEET'96  -  Numerical  Simulation  of  3-D  Sheet 
Metal  Forming  Processes,  1996. 

4.  Proceedings  of  the  6th  International  Conference  on  Numerical  Methods  in  Industrial  Forming  Processes, 
NUMIFORM'98  -  Simulation  of  Materials  Processing:  Theory,  Methods  and  Applications,  1998. 

5.  Zhang,  D.  Lee,  1995.  Effect  of  Process  Variables  and  Material  Properties  on  the  Springback  Behavior 
of  2_D  Draw  Bending  Parts,  Automotive  Stamping  Technology,  11-18. 

6.  M.K.  Mickalich,  M.L.  Wenner,  1988.  Calculation  of  Springback  and  its  Variation  in  Channel  Forming 
Operations,  Symp.  Proc.  for  the  March  3,  Soc.  of  Automotive  Engineers  Meeting. 

7.  M.L.  Wenner,  1983.  On  Work  Hardening  and  Springback  in  Plane  Strain  Draw  Forming.  J.  Applied 
Metal  Working,  2(4). 

8.  F.  Morestin,  M.  Boivin,  "On  the  necessity  of  taking  into  account  the  variation  in  the  Young  Modulus 
with  plastic  strain  in  elastic-plastic  software",  Nuclear  Engineering  and  design,  162,  1996,  107-116 

9.  S.C.  Tang,  "Application  of  an  anisotropic  hardening  rule  to  springback  prediction",  Advanced 
Technology  of  Plasticity,  (1996)  719-722 

1 0.  Kuwabara,  S.  Takahashi  and  K.  Ito,  1 996.  Springback  Analysis  of  Sheet  Metal  Subjected  to  Bending- 
Unbending  under  Tension,  Part  I  and  II,  Advanced  Technology  of  Plasticity,  743-750. 

11.  Kunio  Miyauchi,  1992.  Deformation  path  effect  on  stress-strain  relation  in  sheet  metals,  J.  Materials 
Processing  Technology,  34,  195-200. 

12.  M.L.  Wenner,  1982.  An  Analysis  of  Springback  on  the  Punch  Comer  Radius  in  Channel  Forming, 
(General  Motors  Research  Report),  May  19,  1982. 

13.  M.  Sunseri,  J.  Cao,  A.P.  Karafillis,  M.C.  Boyce,  "Accommodation  of  Springback  Error  in  Channel 
Forming  Using  Active  Binder  Force  Control:  Numerical  Simulations  and  Experiments",  Transactions  of 
ASME,  vol.  118,  July,  1996. 

14.  Vallance  and  D.  K.  Matlock,  1992.  Application  of  the  Bending-Under-Tension  Friction  Test  to  Coated 
Sheet  Steels.  J.  Material  Engineering  and  Performance,  1(5),  685-693. 

15.  L.C.  Zhang,  G.  Lu  and  S.C.  Leong,  1997.  V-shaped  sheet  forming  by  deformable  punches.  J.  of 
Materials  Processing  Technology,  63,  134-139 

16.  W.  H.  Frey,  M.L.  Wenner,  1987.  Development  and  Applications  of  a  One-dimensional  Finite  Element 
Code  for  Sheet  Metal  Forming  Analysis",  (General  Motors  Research  Report),  September. 

17.  Mattiasson,  A.  Strange,  P.  Thilderkvist,  A.  Samuelsson,  1995.  Simulation  of  springback  in  sheet  metal 
forming.  5th  International  Conf.  on  Numerical  Methods  in  Industrial  Forming  Process,  NY,  115-124 

18.  N.  He,  R.H.  Wagoner,  1996.  Springback  Simulation  in  Sheet  Metal  Forming.  NUMISHEET  96,  eds.  J. 
K.  Lee,  G.  L.  Kinzel,  R.  H.  Wagoner,  Ohio  State  University,  308-315. 

a)  R.H.  Wagoner,  W.D.  Carden,  W.P.  Carden,  D.K.  Matlock,  1997.  Springback  after  drawing  and 
bending  of  metal  sheets,  THERMEC  '97,  Australia. 

b)  W.D.  Carden,  1997.  Springback  after  drawing  and  bending  of  metal  sheets.  MS  thesis,  MSE/OSU. 

21.  K.  Li  and  R.H.  Wagoner,  1998.  Simulation  of  springback.  NUMIFORM'98, 21-31. 


104 


22.  W.P.  Carden,  1997.  Analysis  of  Springback  in  Draw-bending  Forming,  MS  Thesis,  MSE/OSU. 

23.  ABAQUS  User  Manual,  version  5.5 

24.  M.J.  Saran,  R.H.  Wagoner,  1991.  "A  Consistent  Implicit  Formulation  for  Nonlinear  Finite  Element 
Modeling  with  Contact  and  Friction,  Pt.  1  -  Theory",  ASME  Trans.  -  J.  Appl.  Mech.,  58,  499-506. 

25.  D.J.  Zhou,  R.H.  Wagoner,  1995.  Development  and  application  of  Sheet-Forming  Simulation",  J.  of 
Materials  Processing  Technology,  50,  1-16. 

26.  M.A.  Crisfield,  1991.  Non-linear  Finite  Element  Analysis  of  Solids  and  Structures,  J.  Wiley  &  Sons, 
England. 

27.  Wang,  G.  Kinzel  and  T.  Altan,  1993.  Mechanical  Modeling  of  Plane-Strain  Bending  of  Sheet  and  Plate. 
J.  Materials  Processing  Technology,  39,  279-304. 

28.  Hosford,  and  R.  Caddell,  1993.  Metal  Forming:  Mechanics  and  Metallurgy.  Englewood  Cliffs,  Prentice- 
Hall. 

29.  C.J.  Burgoyne,  M.A.  Crisfield,  1990.  Numerical  integration  strategy  for  plates  and  shells.  Int.  J.  Num. 
Meth.  Engng.,  29,  105-121. 

30.  M.J.  Wenner,  1996.  private  communication,  General  Motors  Corporation,  March. 

31.  R.FI.  Wagoner,  J.L.  Chenot,  1997.  Fundamentals  of  Metal  Forming,  New  York,  NY,  J  Wiley  &  Sons. 

32.  Huang,  J.C.  Gerdeen,  1994.  Springback  of  Double  Curved  Developable  Sheet  Metal  Surface.  Analysis 
of  Autobody  Stamping  Technology  ,  (Society  of  Automotive  Engineers),  125-138. 

33.  F.  Pourboghrat,  E.  Chu,  1995.  Prediction  of  Springback  and  Side-Wall  Curl  in  2-D  Draw  Bending.  J.  of 
Materials  Processing  Technology,  50,  361-374. 

34.  T.X.  Yu,  L.C.  Zhang,  1996.  Plastic  bending,  theory  and  applications.  World  Scientific. 

35.  Chaboche,  J.  L.,  1986,  Time-independent  constitutive  theories  for  cyclic  plasticity,  International  Journal 
of  Plasticity,  2,  2,  149-188. 

36.  V.  Balakrishnan,  1999.  Measurement  of  in-plane  Bauschinger  effect  in  metal  sheets.  MS  thesis, 
MSE/OSU. 

37.  Z.C.  Xia,  S.C.  Tang,  J.C.  Carnes,  1998.  Accurate  springback  prediction  with  mixed  solid/shell 
elements”,  NUMIFORM98,  813-818. 


105 


Development  of  an  Integrated  System  for  Designing 
Steelmaking  Aim  Compositions 

P.  A.  Manohar*,  S.  S.  Shivathaya**,  M.  Ferry*  and  T.  Chandra* 

*  Department  of  Materials  Engineering,  University  of  Wollongong,  Northfields  Avenue, 

Wollongong,  NSW  -  2522,  Australia. 

**Hawker  de  Havilland  Ltd.,  361  Milperra  Road,  Bankstown,  NSW  -  2200,  Australia. 

(Author  was  with  the  Department  of  Mechanical  Engineering,  University  of  Wollongong  when  work  was  carried  out.) 


ABSTRACT 

A  new  integrated  approach  is  proposed  in  this  paper  to  generate  and  evaluate  the  alternative  steelmaking 
aim  compositions  which  not  only  meet  the  customer  requirements  but  also  suit  the  established  rolling 
schedules.  The  methodology  developed  is  based  on  hybrid  approach  combining  knowledge-bases  as  well  as 
mathematical  modelling  and  is  applicable  for  C  -  Mn  steel  grades.  The  system  consists  of  two  modules.  The 
first  module  utilises  various  empirical  models  for  the  relationship  between  mechanical  properties  and  the 
elements  in  steelmaking  aim  compositions,  along  with  knowledge  bases  containing  expert  and  heuristic 
knowledge  of  expert  metallurgists  to  generate  a  list  of  alternative  steelmaking  aim  compositions.  The 
second  module  uses  the  output  of  the  first  module  and  computes  the  microstructural  evolution  during 
processing  depending  on  steel  composition  and  known  processing  conditions  such  as  strain,  strain  rate, 
temperature,  interpass  time  and  plate  cooling  rate.  Calculated  values  of  the  metallurgical  parameters  are 
then  used  to  estimate  the  achievable  mechanical  properties  in  the  final  hot  rolled  product  using  knowledge¬ 
bases.  System  output  is  expected  to  assist  product  development  metallurgists  in  the  selection  of  appropriate 
steelmaking  aim  compositions  for  any  combination  of  property  specifications  required  by  the  customer. 


INTRODUCTION 

The  steel  industry  is  currently  facing  a  number  of  challenges  such  as  being  more  flexible  and  responsive, 
less  capital  intensive,  energy  efficient  and  environmentally  “greener”.  Competition  from  non-ferrous  metals 
and  non-metallic  materials  is  intensifying.  Relentless  pressure  is  applied  by  ever-increasing  quality  demands 
from  the  customer.  The  need  for  cost  reduction  is  driving  the  industry  towards  increased  automation  to 
produce  higher  quality  steels  at  reasonable  cost.  To  deal  with  these  challenges  and  for  efficient  management 
of  the  uncertainties  involved,  it  has  become  imperative  to  apply  artificial  intelligence  (AI)  techniques  by 
developing  expert  systems  in  almost  all  areas  of  steelmaking  practice.  In  the  past  three  decades,  a  number  of 
systems  have  been  developed  around  the  world  for  more  efficient  solutions  to  problem  areas  of  diagnostics, 
design,  planning,  scheduling,  process  control,  and  quality  control  [1-3].  Application  of  a  knowledge-based 
approach  to  steel  composition  design  has  been  considered  by  several  researchers  [4-5].  Development  of 
such  systems  is  a  complex  task  because  the  material  design  process  is  ill-structured,  difficult  to  systematise 
and  involves  a  large  number  of  rules.  In  addition,  the  relationships  between  composition  and  process 
parameters  and  product  properties  is  nonlinear.  The  knowledge  of  steel  composition  design  is  largely 
intuitive  and  heuristic. 

On  the  other  hand,  hot  rolling  of  steels  has  been  investigated  intensively  and  several  mathematical  models 
and  computer  systems  have  been  developed.  Several  major  objectives  of  the  mathematical  models  have 
been  reported  which  include  improving  the  efficiency  of  mill  trials  to  establish  optimum  compositions  and 
rolling  schedules  [6],  prediction  of  microstructure  and  mechanical  properties  during  rolling  [7], 
development  of  new  steel  grades  and  rolling  processes  [8-9]  increase  productivity  and  quality,  reduce 
manufacturing  cost  through  the  use  as  an  off-line  prediction,  on-line  prediction,  on-line  control  or  off-line 
alloy  and  process  design  tool  [10-1 1],  ability  for  flexible  manufacturing  [12],  control  of  size,  shape,  quality 
and  stability  of  steel  products,  more  responsive  for  product  development  [13],  betterment  of  understanding 
of  the  processes  [14],  a  useful  “what  -  if’ tool  which  provides  directions  for  further  fundamental  research 
along  with  problem  investigation,  schedule  development,  design  or  redesign  of  mill  configuration  and 
enhancement  of  understanding  [15], 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


106 


In  the  current  work,  a  new  integrated  approach  is  proposed  which  combines  the  above  two  approaches  to 
generate  and  evaluate  the  alternative  steelmaking  aim  compositions  which  meet  the  customer  requirements. 
The  system  consists  of  two  modules.  The  first  module  uses  both  mathematical  (iterative)  and  knowledge- 
based  approaches  and  utilises  both  interview  and  non-interview  techniques  for  knowledge  elicitation  (KEL). 
KEL  is  also  characterised  by  a  3-character  codification  scheme  to  record  customer^  special  requirements. 
The  codification  scheme  is  coupled  with  a  decision  table-based  knowledge  representation  tool 
‘TABLEAUX”  for  incorporation  within  knowledge-based  systems.  The  system  generates  a  list  of 
alternative  aim  composition  which  may  meet  the  property  requirements.  The  compositions  are  then 
evaluated  by  using  the  second  module  which  consists  of  mathematical  modelling  approach  to  calculate  the 
microstructural  evolution  during  hot  rolling  of  the  steel.  The  estimated  values  of  metallurgical  parameters 
are  then  converted  to  prediction  of  the  mechanical  properties  of  the  steel  products  using  empirical 
relationships,  thus  enabling  more  realistic  assessment  of  the  designed  compositions.  The  system  is 
developed  in  C  language  on  an  IBM  PC  in  a  windows  environment.  User  interface  is  developed  utilising  a 
commercial  package,  PROTOGEN+’  to  make  the  system  user  friendly.  The  system  is  expected  to  assist 
the  product  development  metallurgists  in  the  selection  of  appropriate  steelmaking  aim  composition  and  for 
hot  rolling  process  optimisation. 


KNOWLEDGE  ELICITATION 

The  process  for  knowledge  elicitation  (KEL)  adopted  in  this  work  has  been  reported  elsewhere  [16], 
however  a  brief  summary  is  given  here.  The  KEL  is  characterised  by  a  three-character  codification  scheme 
having  a  hybrid  structure  to  codify  all  the  customer^  special  requirements  based  on  the  initial  structured 
and  unstructured  interviews.  The  customer  special  requirement  codes  (CSRCs)  are  given  by  the  equation: 


Customer  Special  Requirement  Code  =  XtYjZk 

The  first  character  in  the  code  is  Xt  called  the  major  group  code,  which  is  the  i*  property  of  a  steel  grade  (eg. 
tensile  strength,  yield  strength,  elongation  etc.).  The  second  character  in  the  CSRC  is  called  the  subgroup 
code  and  it  represents  the  f'  type  of  steel  (eg.  structural,  pressure  vessel,  line  pipe  steel  etc.).  Zk  is  the  value 
code  which  represents  the  value  of  the  properties  relating  to  each  combination  ofA/h  and  Yfi.  Zk  has  a 
hierarchical  structure  while  X )  and  Yj  are  chain  type  structures.  Figure  1  illustrates  the  codification  scheme 
along  with  major  codes,  subgroup  codes  and  value  codes.  A  total  of  238  328  CSRCs  are  possible  using  this 
codification  scheme. 


Major  Description 
Code 


Charpy  Impact  Test 
RAZ 

Elongation 
Tensile  Strength 
Drop  Wt.  Tear  Test 
Nil  Ductility  Tear 
Test 


Description 


AS3678 
AS  1548 
ASTM  A516 
JIS3106 
DIN  17155 
BS  1501 


Mm.  absorbed  energy 
ave.  of  3  tests  is  27J  & 
individual  is  20J 
at-15°C 

Min.  absorbed  energy 
ave.  of  3  tests  is  27J  & 
individual  is  20J 
at-0°C 


Fig.  1.  Codification  scheme  for  the  Customer^  Special  Requirement  Codes  ( CSRCs). 


For  Subgroup  Code  ‘1  ’ 
and  Major  Code  ‘1  ’ 


Description 


107 


DETERMINING  AIM  COMPOSITION 

The  approach  adopted  in  this  work  for  the  design  of  steelmaking  aim  composition  is  represented  in  the  form 
of  a  flow  chart  shown  in  Figure  2. 


Fig.  2.  Flow  chart  for  determining  steelmaking  aim  compositions. 


108 


A  list  of  alternative  steel  making  aim  compositions  is  generated  using  four  knowledge  bases,  two  iterative 
models  and  one  steel  processing  module  as  shown  in  Figure  2.  Input  information  from  the  user  regarding  the 
material  enquired  or  ordered  is  obtained  through  interactive  sessions.  Information  on  the  material  standard, 
its  size,  quantity,  weight,  end  use,  and  customer^  special  requirements,  if  any,  is  the  input  to  the  system 
through  the  dialogue  sessions.  Depending  on  the  type  of  inputs,  the  knowledge  base  KB  I  is  accessed  which 
consists  of  information  on  properties  and  composition  corresponding  to  relevant  material  standards.  The 
material  standards  include  Australian  standards  and  other  overseas  standards  transformed  into  a  form  which 
is  similar  to  the  Australian  standards.  Customer  special  requirements  are  also  included  in  KB  I.  Based  on 
the  customer  special  requirements,  the  composition  and  mechanical  properties  from  the  existing  steels  need 
to  be  modified.  This  is  achieved  through  the  knowledge  rules’ contained  in  the  second  knowledge  base,  KB 
II.  The  knowledge  representation  in  KB  II  is  done  in  the  form  of  IF-THEN  rules  which  relate  to  main 
categories:  composition  and  processing.  An  example  of  rules  in  the  composition  category  is  given  below: 

IF  Type  of  steel  is  structural  AND  Testing  requested  is  RAZ  AND  Value  of  RAZ  is  25%  minimum 
THEN  Maximum  S  is  0.005%  AND  Maximum  H  is  0.00019%  AND  Maximum  Ca  is  0.01 0%. 

Input  information  about  the  end  use  of  the  steel  or  the  intended  application  of  the  steel  alongwith 
information  in  KB  I  and  KB  II  dictates  a  set  of  rules  regarding  elements  to  be  included  in  the  aim 
composition  and  the  basic  process  route  to  be  followed.  The  process  routes  could  be  hot  rolled,  controlled 
rolled,  or  normalised.  These  rules  are  included  in  the  third  knowledge  base,  KBIII.  Upper  and  lower  values 
of  aim  composition  determined  by  KB  III  are  based  on  the  assumption  that  a  tolerance  could  be  applied  to 
the  certification  limit  values  (CLIM)  to  obtain  the  minimum  and  maximum  values  of  aim  composition.  In 
the  case  of  carbon  the  minimum  and  maximum  values  are  given  by 

Cmin  =  lower  CLIM  +  0.02  Cmax  =  upper  CLIM  -  0.02. 

Iterative  Model  I  subsequent  to  KB  III  calculates  the  initial  aim  values  of  the  CLIM  using  the  information 
from  KB  III  and  combining  it  with  expert  and  heuristic  knowledge.  The  model  calculates  carbon  equivalent 
(CEQ)  for  each  CLIM,  uses  empirical  formulae  to  convert  these  CEQ  values  to  mechanical  properties  and 
then  finds  out  a  range  of  CLIM  values  which  are  close  to  the  required  CEQ,  mechanical  properties,  steel 
grade  and  thickness  combination.  Some  values  of  aim  composition,  in  spite  of  being  within  the  range  of 
values  obtained  through  KB  III,  are  infeasible  due  to  practical  difficulties  faced  by  either  the  plate  mill  or 
the  slab  caster  in  using  the  above  aim  values.  In  addition,  based  on  the  end  use  and  the  mechanical 
properties  required,  certain  strategies  need  to  be  adopted  in  the  design  of  the  aim  compositions.  Such 
strategies  impose  further  restrictions  on  the  aim  composition  values.  Thus  the  rules  regarding  process 
limitation  and  design  strategies  are  contained  in  the  fourth  knowledge  base,  KB  IV.  Some  examples  of  such 
rules  are: 

increment  in  C  (AC)  =  0.005%,  ANb  =  0.001%,  AB  =  0.0001%,  Cu:Ni  <  2.0,  Ti:N  <  3.42,  Mn:C  >3. 

The  output  from  KB  IV  and  the  process  details  given  in  KB  III  are  combined  in  steel  processing  module 
which  calculates  the  metallurgical  structure  evolution  as  a  function  of  composition  and  process  sequence. 
Details  of  the  steel  processing  module  is  given  in  the  following  section. 


STEEL  PROCESSING  MODULE 

The  flow  chart  for  the  steel  processing  module  is  given  in  Figure  3. 

Mathematical  modelling  of  microstructural  evolution  during  hot  rolling  of  steels  has  received  a  great  deal  of 
attention  over  the  past  two  decades  and  a  number  of  models  which  describe  metallurgical  phenomena 
during  steel  processing  have  been  published  for  different  steel  compositions  (eg.  C-Mn,  Nb-/Ti-/Nb-Ti/Nb- 
V  microalloyed  steels)  and  a  variety  of  steel  processing  routes  (eg.  conventional,  conventional  controlled 
rolling,  recrystallization  controlled  rolling,  hot  direct  rolling  etc.).  These  models  have  been  reviewed  by 
Sellars  [17]  and  Kwon  [11],  The  basic  equations  employed  in  the  current  work  are  given  in  Table  1. 
Iterative  Model  II  calculates  the  mechanical  properties  for  each  steelmaking  aim  composition  (SAC)  based 
on  the  output  from  KB  IV  and  the  steel  processing  module.  Empirical  models  derived  from  the  statistical 
data  are  utilised  for  this  process.  The  empirical  models  are  characterised  by  an  error  of  about  ±  20  MPa  in 


109 


the  prediction  of  tensile  strength  and  upper  yield  strength,  a  factor  of  safety  of  40  MPa  is  added  to  the 
required  values  of  tensile  and  upper  yield  strength  while  comparing  with  the  corresponding  computed 
values.  Thus  the  final  aim  composition  list  is  generated  which  has  alternative  aim  compositions  that  are 
feasible  for  any  inquiry  or  order. 


Initial  Structure  Model:  Input  =  fr,,  T, 
soaking  time;  calculate  do 


Structure  -  Property  Model:  input  =  output 
of  the  above  model,  composition  and 
Knowledge-Base  IV;  calculate  C.Y.S., 
C.T.S. 


Hot  Deformation  Model:  input  =  8,  8  ,  T, 
d0,  t;  calculate  drex,  X,  hf 


1 


Phase  Transformation  Model:  input  =  dT, 
Sv,  cooling  rate,  composition;  calculate 
size  and  volume  fraction  of  phases 


Fig.  3.  Flow  chart  for  the  steel  processing  module. 


Table  I.  Summary  of  the  basic  equations  used  in  the  computer  model  for  processing  a  C-Mn  steel. 


Parameter 

Equation 

Reference 

pass  strain  e 

1.155  lnOio/hf) 

[15] 

pass  strain  rate  £ 

eVR/  ^jRR(h0  ~hf) 

as  above 

time  for  50%  recrystallization  to.5 

2.5  x  1  O'19  x  d02  x  e'4  x  exp(300000/RT) 

as  above 

volume  fraction  recrystallized  X 

1  -  exp(-0.693(t/t0  5)l  i) 

M 

recrystallized  grain  size  drex 

0.763do°'67/e 

[15] 

Zener  -  Hollomon  parameter  Z 

e  xexp(312000//?r) 

[18] 

time  for  95%  recrystallization  (to.95) 

3.54xl0'21  x  eA Z‘3/8 x  d„2exp(480000/RT) 

as  above 

Grain  growth  during  interpass  time 

d7  -  d07  =  1 .45x1 027x  exp(-400000/RT)xteff ; 

teff  =  t  -  to.95 

[19] 

grain  size  estimation  when 

£eff  —  £pass 

[20] 

X  <  0.95  (partially  recrystallized 

Ae  =  const,  x  eprcviouSx(l-X) 

austenite) 

const.  =  1  if  X  <  0.1;0.5  if  X  >0.1; 

j  _  rv  C-r A  0.67  -0.67 

ueff  0.5xdo  XEeff 

d  =  X4/3xdrex  +  (1-X)  drff 

[15] 

h0  =  original  slab  thickness,  hf  =  final  slab  thickness,  T  =  pass  temperature,  t  =  interpass  time,  VR  =  peripheral  roll 
speed  (mm/s),  RR  =  roll  radius  (mm),  R  =  gas  constant  (8.31  J/mol-°K),  d0  =  initial  grain  size. 


SUMMARY 

The  new  integrated  system  for  material  design  proposed  in  this  work  combines  both  mathematical 
modelling  and  knowledge-based  approaches.  Mathematical  modelling  enables  iterations  involving 
enormous  computations  while  the  knowledge-based  approach  enables  utilisation  of  the  expert  as  well  as  the 
heuristic  knowledge  from  a  group  of  experts  to  successfully  determine  the  steelmaking  aim  compositions. 
The  procedure  involved  in  this  approach  is  to  identify  the  possible  customer  requirements  with  regard  to 
composition,  mechanical  properties  and  testing  requirements,  then  to  codify  them,  coupled  with  processing 
schedules  used  in  the  industry  and  finally  to  direct  the  KEL  to  acquire  knowledge  to  deal  with  these  special 
customer  requirements.  The  quality  of  the  output  of  this  system  is  depends  mainly  on  the  quality  of  the  rules 


110 


in  knowledge  bases  and  mathematical  models.  As  the  knowledge  base  grows  richer  by  experience  and  the 
mathematical  models  refined  further  through  research,  it  is  always  possible  to  incorporate  more  rules  into 
knowledge  bases  to  improve  the  output  of  the  system.  The  system  is  expected  to  assist  metallurgists  to 
choose  an  existing  composition  or  to  design  a  new  steel  composition  so  that  the  customer  requirement  are 
satisfies  in  an  economical  way.  The  prototype  material  design  system  has  been  fully  implemented  by 
developing  a  software  module  for  generating  alternative  steelmaking  aim  compositions  that  are  practically 
feasible  for  the  slab  caster  and  plate  mill.  Implementation  of  the  process  optimisation  module  is  currently 
under  development. 


REFERENCES 

1.  H.  Kominami,  S.  Naitoh,  N.  Kamada,  C.  Hamaguchi,  T.  Tanaka,  H.  Endo,  1991.  Neural  network  system 
for  breakout  prediction  in  continuous  casting  process.  Nippon  Steel  Tech.  Report,  49(4),  34-38. 

2.  H.  Yasuda,  Y.  Nakatsuka,  A.  Yamamoto,  I.  Takeuchi,  T.  Hashimoto,  1992.  An  expert  system  for  the 
material  design  of  large-diameter  steel  pipe.  The  Sumitomo  Search,  50(7),  3-10. 

3.  R.S.H.  Mah,  K.D.  Schnelle,  A.N.  Patel,  1991.  A  plant-wide  quality  expert  system  for  steel  mills. 
Computers  in  Chemical  Engineering,  15(6),  445-450. 

4.  S.S.  Shivathaya,  1997.  Material  design  in  steelmaking  utilising  mathematical  modelling,  knowledge- 
based  and  fuzzy  logic  approaches.  Ph.  D.  Thesis,  University  of  Wollongong,  Australia. 

5.  FJ.  Vasko,  F.E.  Wolf,  K.L.  Stott,  1989.  A  set  covering  approach  to  metallurgical  grade  assignment. 
European  Journal  of  Operational  Research,  38,  27-34. 

6.  T.  Abe,  T.  Honda,  S.  Ishizaki,  H.  Wada,  N.  Shikanai,  T.  Okita,  1990.  Application  of  computer 
modelling  of  thermo-mechanical  processing  on  steel  plate  production,  proc.  int.  symp.  on  Mathematical 
Modelling  of  Hot  Rolling  of  Steel,  ed.  by  S.  Yue,  Hamilton,  Canada,  66-75. 

7.  P.  Choquet,  A.  Le  Bon,  Ch.  Perdrix,  1985.  Mathematical  model  for  prediction  of  austenite  and  ferrite 
microstructures  in  hot  rolling  processes,  proc.  int.  conf.  on  Strength  of  Metals  and  Alloys  (ICSMA  7),  2, 
ed.  by  H.  J.  McQueen  etal.,  Montreal,  Canada,  1025-1030. 

8.  P.D.  Hodgson,  R.K.  Gibbs,  1990.  A  mathematical  model  to  predict  the  final  properties  of  hot  rolled  C- 
Mn  and  microalloyed  steels,  proc.  int.  symp.  on  Mathematical  Modelling  of  Hot  Rolling  of  Steel,  ed.  by 
S.  Yue,  Hamilton,  Canada,  76-85. 

9.  A.  Laasraoui,  J.J.  Jonas,  1991.  Recrystallization  of  austenite  after  deformation  at  high  temperatures  and 
strain  rates  -  analysis  and  modelling.  Met.  Trans.  ASM,  22A,  151-160. 

10.0.  Kwon,  1995.  Modelling  of  austenite  evolution  and  transformation  for  MA  strips,  proc.  int.  conf. 
Microalloying  95,  ed.  by  M.  Korchynski  et  al.,  ISS,  Pittsburgh,  USA,  251-261. 

ll.O.  Kwon,  1992.  Technology  for  the  prediction  and  control  of  microstructural  changes  and  mechanical 
properties  in  steel.  ISIJ  International,  32,  350-358. 

12.  M.  Suehiro,  K.  Sato,  Y.  Tsukano,  H.  Yada,  T.  Senuma,  Y.  Matsumura,  1987.  Computer  modelling  of 
microstructure  change  and  strength  of  low  carbon  steel  in  hot  strip  rolling.  Trans.  ISIJ,  27, 439-445. 

13. N.  Komatsubara,  K.  Kunishige,  S.  Okaguchi,  T.  Hashimoto,  K.  Ohshima,  I.  Tamura,  1990.  Computer 
modelling  for  the  prediction  and  control  of  mechanical  properties  in  plate  and  sheet  steel  production. 
The  Sumitomo  Search,  44,  159-168. 

14.  M.  Pietrzyk,  C.  Roucoules,  P.D.  Hodgson  1995.  Modelling  the  thermomechanical  and  microstructural 
evolution  during  rolling  of  a  Nb  HSLA  steel.  ISIJ  Intemationaf35,  531-541 . 

15. J.H.  Beynon,  C.M.  Sellars,  1992.  Modelling  microstructure  and  its  effects  during  multipass  hot  rolling. 
ISIJ  International,  32,  359-367. 

16. X.D.  Fang,  S.S.  Shivathaya,  1995.  Eliciting  knowledge  for  material  design  in  steelmaking  using  paper 
models  and  codification  scheme.  Engineering  Applications  of  Artificial  Intelligence,  8(1),  15-24. 

17. M.  Sellars,  1990.  Modelling  -  an  interdisciplinary  activity,  proc.  int.  symp.  on  Mathematical  Modelling 
of  Hot  Rolling  of  Steel,  ed.  by  S.  Yue,  Hamilton,  Canada,  1-18. 

18.  M.  Sellars,  J.  Whiteman,  1979.  Recrystallization  and  grain  growth  in  hot  rolling.  Met.Sci.,  13,  187-194. 

19.  P.D.  Hodgson,  R.K.  Gibbs,  1992.  A  mathematical  model  to  predict  the  mechanical  properties  of  hot 
rolled  C-Mn  and  microalloyed  steels.  ISIJ  International,  32,  1329-1338. 

20.  A.  Laasraoui,  J.J.  Jonas,  1991.  Prediction  of  temperature  distribution,  flow  stress  and  microstructure 
during  multipass  hot  rolling  of  steel  plate  and  strip.  ISIJ  International,  31, 95-105. 


Ill 


A  SCADA-based  Expert  System  To  Provide  Delay  Strategies 
for  a  Steel  Billet  Reheat  Furnace 

Clifford  Mui*,  Edmund  Osinski**,  John  A.  Meech**,  Peter  V.  Barr** 

*DynaMotive  Technologies  Corporation,  Vancouver,  B.C.,  Canada 
Email:  cliff  mui@bc.svmpatico.ca 
** Centre  for  Metallurgical  Process  Engineering, 

University  of  British  Columbia, Vancouver,  B.C.,  Canada 
Email:  iam@mining.ubc.ca.  pvbarr@interchange.ubc.ca 

ABSTRACT 

The  manufacturing  of  steel  bar  products  in  mini-mills  involves  continuous  casting  of  billet  sections,  cooling 
of  the  billets,  reheating  to  rolling  temperatures  and  final  shaping  and  size  reduction  in  rolling  mills.  The 
operation  of  the  reheat  furnaces  is  a  significant  challenge  due  to  the  dynamic  nature  of  both  the  reheating 
and  rolling  processes.  The  operation  of  a  furnace  was  analyzed  with  the  use  of  a  SCADA  data  collection 
system,  and  with  both  steady  state  and  transient  mathematical  models.  The  new  knowledge  gathered  in  this 
way  was  complimented  by  knowledge  from  experienced  mill  personnel  to  form  the  basis  for  an  expert 
system  designed  to  offer  timely  advice  to  furnace  operators.  The  result  was  development  of  an  industrial 
expert  system  leading  to  an  increase  in  furnace  and  rolling  mill  productivity  [1]. 


INTRODUCTION 

In  a  typical  mini-mill  steelmaking  plant,  the  reheat  furnace  is  situated  between  the  caster  which  produces 
steel  billets,  and  the  rolling  mill  which  shapes  the  billets  into  finished  products.  The  operation  of  a  steel 
billet  reheat  furnace  located  in  Alberta,  Canada  has  been  utilized  in  our  analysis.  Production  of  material 
such  as  construction  rebar  or  rail  sections  involves  conversion  of  raw  billets  into  hot  rolled  products.  Billets 
must  be  heated  in  the  furnace  in  order  to  bring  average  billet  temperature  to  a  point  at  which  the  billets  can 
be  rolled  with  reasonably  low  force  and  with  a  proper  microstructure  at  the  end  of  rolling. 

In  an  ideal  world,  all  processes  would  run  at  steady  state  and  no  problems  would  occur  to  disturb  this 
perfect  equilibrium.  In  this  perfect  world,  steel  billets  are  charged  cold  into  the  furnace,  heated  for  a  set 
time  until  they  can  be  removed  in  a  hot  state  at  regular  intervals,  and  all  billets  are  heated  homogeneously 
prior  to  rolling  in  the  downstream  mill  rolls.  The  furnace,  of  course,  would  be  operated  at  optimal  steady- 
state  conditions  at  all  times  and  would  never  require  adjustment.  The  rolling  mill  would  be  able  to  accept 
these  billets  in  a  timely  fashion  and  would  shape  all  of  the  bars  successfully  into  rolled  products. 
Unfortunately,  this  ideal  world  does  not  exist. 

In  the  real  world,  furnaces  routinely  experience  transient  conditions.  For  example,  scheduled  delays  due  to 
regular  downstream  roll  changes  or  unscheduled  delays  due  to  unexpected  cobbles  are  a  few  of  the 
conditions  that  may  be  encountered  during  initial  charging  of  cold  billets.  Furnaces  have  large  thermal 
inertia  -  things  change  slowly  and  errors  take  time  to  recover  from.  Control  of  furnace  temperatures  is 
critical  if  the  desired  result  is  to  produce  a  homogeneously-heated  billet  at  the  proper  time  and  proper 
temperature.  The  consequences  of  improperly  controlled  delays  may  include  unnecessary  rolling  mill  idle 
time,  uneven  rolling  of  unevenly  heated  billets,  center-line  cracking  or  product  downgrading  due  to 
excessive  heating  time  within  the  furnace.  Intelligent  control  of  this  process  is  the  responsibility  of 
experienced  furnace  operators  who  base  their  decisions  on  a  myriad  of  different  factors.  Consistency  of 
such  operator-based  control  can  be  poor  at  the  best  of  times  especially  when  new  employees  enter  the 
situation  and  require  training.  So  a  systematic,  computerized  approach  seems  to  be  the  solution. 

Artificial  Intelligence  is  defined  as  "a  collection  of  computer-based  techniques  which  manipulate  symbols 
rather  than  numbers  to  enable  computers  to  produce  behavior  resembling  that  previously  only  seen  in 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


112 


humans"  [2].  Expert  Systems  involve  the  application  of  Artificial  Intelligence  concepts  to  real  world 
problems.  Expert  Systems  operate  very  differently  from  conventional  computer  programs  in  that  the 
problem  solving  techniques,  or  "heuristics"  mimic  human-problem-solving.  Expert  systems  would  therefore 
have  a  distinct  advantage  over  conventional  control  schemes  in  experience-laden  applications  such  as  steel 
reheating  furnaces.  The  ultimate  goal  of  this  exercise  was  to  improve  product  consistency  and  lower  mill 
operating  costs. 


THE  STEEL  REHEAT  FURNACE 

The  facility  chosen  to  implement  this  expert  system  was  a  mini-mill  located  in  Alberta,  Canada.  The  mill 
buys  scrap  steel  on  the  open  market,  melts  the  raw  material  in  an  electric  arc  furnace  and  makes 
metallurgical  adjustments  to  composition  prior  to  a  continuous  casting  process  to  produce  square  steel 
billets.  These  billets  are  taken  to  a  storage  yard  where  they  are  cooled  prior  to  scheduled  charging  into  a 
natural  gas-fired  reheat  furnace.  The  billets  are  heated  in  the  furnace  to  obtain  a  homogeneous  temperature 
prior  to  hot  forming.  The  hot  billets  from  the  furnace  are  then  shaped  in  a  rolling  mill  to  produce  bar 
products  such  as  grinding  stock,  rebar  and  structural  stock. 

The  natural  gas  fired  furnace  utilizes  a  combination  of  stationary  and  walking  beams  which  carry  the  billets 
into  and  out  of  the  heating  zones.  The  furnace  is  controlled  as  three  distinct  zones  in  which  individual  PLCs 
(programmable  logic  controllers)  control  zone  temperatures  in  accordance  with  set-points  chosen  by  the 
shift  furnace  operators.  Operators  follow  general  guidelines  for  control  but  they  rely  mainly  on  experience 
and  "know  how"  to  select  appropriate  measures  during  both  steady  state  and  transient  operating  conditions. 
Figure  1  illustrates  a  side  cut-away  view  of  the  reheating  furnace. 


Flue 


Top  Fired,  Walking  Beam,  Steel  Reheat  Furnace 


Discharge 


Fig.  1  Side  View  of  the  Reheat  Furnace. 

Developed  in  conjunction  with  the  expert  system  initiative,  the  operation  of  the  furnace  was  examined 
thoroughly  using  mathematical  models.  A  steady-state  model  was  originally  developed  by  Barr  [3]  to  study 
the  efficiency  of  steady-state  operating  procedures.  A  plant  trial  in  1995  produced  data  to  assist  in  the 
evolution  of  a  3-D  transient  model  which  was  later  verified  at  two  Canadian  reheat  furnaces  by  Scholey  [4], 
These  models  along  with  control-scenario  development  by  Osinski  [3]  became  integral  in  the  development 
of  this  project. 


BUILDING  OF  THE  KNOWLEDGE  BASE 

The  operational  knowledge  used  to  control  the  furnace  in  the  form  of  operator  experience  should  be  a 
valuable  starting  point  in  building  a  knowledge  base.  Unfortunately,  furnace  operators  have  created  many 
different  approaches  to  deal  with  transient  conditions  and  their  experience  under  similar  circumstances, 
while  useful,  was  too  inconsistent  to  allow  collection  of  the  "best"  knowledge.  This  "episodic"  knowledge 
was  disjointed  and  often  clouded  by  "process  lore"  [6].  Therefore  these  ideas  were  not  used  directly  in  the 
knowledge  acquisition  phase.  However,  a  set  of  "Standard  Operating  Practices"  (SOP)  was  created  by  the 


113 


plant  combustion  engineer  and  from  discussions  with  the  operators  to  provide  congruency  in  operating 
strategy.  This  agreement  in  the  required  furnace  operating  practices  provided  a  good  degree  of  procedural 
consensus  [5],  The  majority  of  this  knowledge  is  in  the  form  of  "declarative"  and  "procedural"  knowledge 
and  is  considered  significant  because  it  was  scrutinized  by  the  combustion  engineer  and  untainted  by 
Process  Lore.  The  SOP  became  the  basis  for  creation  of  a  reheat  furnace  control  knowledge  base  and  was 
incorporated  into  the  Expert  SCADA  System  as  an  on-line  hypertext  help  document. 

The  expert  system  was  designed  to  allow  its  basic  knowledge  to  be  maintained  as  well  as  the  addition  of 
newly  developed  or  acquired  knowledge.  The  expert  system  is  able  to  provide  timely,  online  knowledge- 
based  advice  through  a  user-friendly  man-machine  interface.  In  addition,  the  system  created  a  way  for  mill 
management  to  introduce  a  standardized  methodology  to  handle  the  majority  of  furnace  transients. 


ACQUIRING  PROCESS  INFORMATION 

The  SCADA  package  chosen  to  implement  this  expert  system  was  ProcessVision  and  Comdale/C  operating 
on  a  PC  in  a  QNX  operating  system  environment.  The  Comdale/C  application  can  be  separated  into  a  series 
of  separate  modules  each  given  specific  tasks.  Although  these  modules  all  operate  independently  in  QNX, 
they  are  linked  to  influence  the  operation  of  each  other.  The  Comdale  system  was  originally  designed  to 
operate  efficiently  with  processes  in  which  reaction  times  are  relatively  slow.  The  inference  engine  operates 
with  a  one  second  or  greater  cycle  time  which  is  adequate  for  moderately-paced  supervisory  control 
situations  on  continuous  processes.  The  cycle  times  of  such  systems  must  be  fast  enough  to  sample  critical 
events  accurately.  The  reheat  furnace  system  has  a  myriad  of  slow  response  inputs  but  also,  one  very-fast 
output. 

This  latter  signal  is  generated  from  an  optical  pyrometer  which  provides  a  continuous  analog  measurement 
of  the  temperature  profile  of  a  billet  discharged  from  the  furnace.  This  process  is  a  batch  occurrence  taking 
place  on  a  regular  basis  every  1  to  2  minutes  depending  on  the  production  level.  A  billet  passes  in  front  of 
the  instrument  within  a  2  second  time  interval.  An  existing  data  collection  system  at  the  time  this  project 
was  commenced  recorded  about  2000  temperature  points  along  the  billet,  i.e.,  at  a  rate  of  one  record  every 
millisecond.  In  order  to  link  such  high  frequency,  intermittent  data  to  the  intelligent  supervisory  control  and 
data-acquisition  system  (I-SCADA)  a  set  of  high  speed  Computational  Intelligence  (Cl)  modules  were 
designed  to  operate  in  parallel  with  the  Man-Machine  Interface  (MMI)  [7,8].  These  data  collection  drivers 
intelligently  samples  input  signals  until  an  event  is  detected.  After  the  driver  has  buffered  the  record  ofjhe 
event  internally,  the  data  are  processed  and  uploaded  into  the  inference  engine  at  a  rate  equivalent  to  the 
current  billet  production  rate  of  1-2  minutes. 


APPROACH  TO  THE  REHEAT  PROBLEM 

The  furnace  had  been  operating  reasonably  satisfactorily  in  preparing  billets  for  the  rolling  mill  but  there 
was  a  strong  desire  to  improve  certain  aspects  of  the  process.  Delays  in  production  of  hot  billets  result  in 
loss  of  useful  rolling  mill  time  and  a  subsequent  monetary  loss.  As  well,  poor  procedural  operation  results  in 
low  recovery  rates  or  product  yields  due  to  billet  rejection  as  well  as  product  downgrading.  One  goal  of  the 
project  was  to  reduce  both  of  these  factors  as  well  as  fuel  consumption.  Part  of  this  goal  can  be  achieved  by 
having  the  operators  use  standardization  procedures  in  response  to  delay  situations.  The  expert  system 
provided  direct  access  to  all  of  the  mill's  standardized  settings  and  procedures. 

A  secondary  objective  of  the  expert  system  was  to  collect  as  much  static  knowledge  on  the  operation  of  the 
furnace  as  well  as  make  timely  assessments  of  transients  based  on  the  "best  guess"  of  furnace  operators.  The 
system  contains  logic  rules  to  provide  automatic  detection  of  a  delay  situation  which  allows  the  system  to 
flag  the  operator  with  an  initial  estimate  of  the  delay  duration.  This  estimate  is  made  in  linguistic  terms  as  a 
short,  medium,  or  long  delay.  The  range  of  each  of  these  terms  is  assigned  from  the  knowledge  base 
depending  on  the  type  of  product  being  heated.  The  system  issues  an  advisory  based  on  this  input  as  well  as 
information  provided  by  the  knowledge  base  (SOP)  with  respect  to  the  delay  category.  The  category 


114 


determined  causes  the  system  to  make  specific  recommendations  regarding  burner  settings  to  attempt  to 
reduce  energy  consumption  and  avoid  overheating  of  the  billets  held  within  the  furnace. 

As  a  delay  progresses,  the  system  tracks  the  actual  delay  length  to  determine  if  the  initial  estimate  was 
correct.  If  the  delay  terminates  prematurely  or  the  delay  is  longer  than  initially  estimated,  the  advice  that 
was  originally  issued  about  burner  settings  is  now  likely  unsuitable.  Table  1  illustrates  the  possible 
outcomes  of  an  initial  delay  estimates  and  a  variety  of  actual  delay  times.  The  system  is  able  to  determine 
quickly  this  aberration  and  issue  a  revised  advisory.  This  advice  will  dynamically  change  based  on  the  new 
estimated  delay  time  as  well  as  the  actual  progressive  time.  Table  2  shows  how  the  system  can  adjust  to 
meet  the  requirements  of  a  correction  in  the  actual  delay  duration. 


Table  1.  Possible  Outcomes  of  Delay  Time  Estimation  and  Error  Consequences. 


Predicted  Short  Delay 

Predicted  Medium  Delay 

Actual  Delay  is  Short 

Accurate  delay  estimate 

Furnace  too  cold,  delay  in 
heating  back  to  temperature 

Actual  Delay  is  Medium 

Furnace  not  cooled  enough, 
billets  may  overheat 

Accurate  delay  estimate 

Furnace  too  cold,  delay  in 
heating  back  to  temperature 

Actual  Delay  is  Long 

Furnace  not  cooled  enough, 
billets  may  overheat  and 
spend  too  much  time  at  high 
temperatures 

Furnace  not  cooled  enough, 
billets  may  overheat,  possibly 
long  time  at  high 
temperatures 

Accurate  delay  estimate 

Table  2.  Possible  Outcomes  of  Delay  Time  Estimation  and  Control  Responses. 


Predicted  Short  Delay 

■  ~'l  "1  i|H  1  Til  I'M 

Actual  Delay  is  Short 

Well  understood  and  simple 
control 

Furnace  too  cold,  advise 
heating  regiment 

Furnace  much  too  cold, 
advise  a  rapid  heating 
regiment 

Actual  Delay  is  Medium 

Furnace  not  cooled  enough, 
advise  lower  zones  set  points 

Well  understood  and  simple 
control 

Furnace  too  cold,  advise  a 
heating  regiment 

Actual  Delay  is  Long 

Furnace  not  cooled  enough, 
advise  lower  zone  set  points, 
track  overheat  times 

Furnace  not  cooled  enough, 
advise  lower  zones  set  points 

Well  understood  and  simple 
control 

IMPLEMENTATION  OF  THE  EXPERT  SYSTEM 

System  implementation  was  divided  into  two  distinct  phases.  The  first  phase  was  long-term  data  collection 
and  logging.  In  this  stage,  inputs  from  the  furnace  controllers  were  recorded  along  with  some  inputs 
normally  seen  only  by  furnace  operators.  The  data  were  compiled  into  daily  log  files  and  subsequently  into 
the  cumulative  collection  of  files  into  the  plant  database.  These  files  along  with  hard  copies  of  schedules 
were  remotely  retrieved  and  analyzed  both  by  plant  engineers  at  the  mill  and  by  personnel  at  UBC.  The 
second  phase  involved  construction  of  the  knowledge  base  and  insertion  into  a  man/machine  interface 
designed  to  look  similar  to  that  which  the  operators  were  already  familiar.  The  system  could  detect  transient 
conditions  and  provide  timely  advice  to  the  operator  in  both  steady-state  and  transient  operating  conditions. 

Phase  I  of  the  system  implementation  involved  installation  of  a  data-logger  to  facilitate  long  term  collection 
of  furnace  control  knowledge.  Developing  the  man-machine  interface  was  considered  of  utmost  importance 
due  to  the  need  for  operator  acceptance  [5]  and  the  creation  of  a  method  to  verify  the  expert  system. 

The  billets  are  scanned  for  their  temperature  profile  to  retrieve  information  for  the  Expert  System  and  to 
display  to  the  operators.  Information  such  as  maximum  and  minimum  temperatures  as  well  as  the  profile 
shape  characteristics  decide  the  fate  of  a  billet.  A  billet  that  is  not  up  to  standard  for  a  particular  job  may  be 
rejected  prior  to  rolling  or  the  rolled  product  is  held  for  quality  control  assessment.  The  user  interface  was 
created  to  display  the  most  important  operating  data  on  a  "main"  screen  and  supplementary  information  on 
secondary  screens  accessible  by  "clicking"  a  button  on  the  main  screen.  Figure  2  illustrates  the  main  screen 
which  provides  a  large  amount  of  information  to  be  available  to  the  operator  in  a  high-density  graphical 
form  as  noted  by  the  billet  temperature  profile  and  the  historical  profile  limits  in  the  lower  portion  of  the 
display.  Also  included  in  this  display  are  a  billet  heat-tracking  and  operating  status  message  strip. 


115 


Itrlejfiiigi 


billet 


Ready 


1200  1200 

II  .  .  ..I . Iilllllllllllil  llm. Illlilin... ill. ml. mil 

1100  it  . . .  iniiiiiiiiiiii  . . . 

1100|||||||liilllllllllllllllllllll'l  . . . I 

1000"i!!!|!|!!ii!|!iii!i . . 1000 


Fig.  2  Main  Screen  of  the  Expert  System. 


The  second  phase  of  the  project  included  installation  of  a  knowledge  base  containing  plant  procedures  and 
the  development  of  an  advisory  screen  which  prompts  the  operator  to  read  during  transition  or  transient- 
operating  events.  The  expert  system  provides  static  advice  as  well  as  advice  based  on  delay  conditions.  If 
the  operator  provides  an  estimate  of  the  delay  length,  the  system  will  present  appropriate  information  to 
handle  the  situation.  If  the  actual  delay  time  is  longer  or  shorter  than  the  prediction,  the  advice  changes  to 
suit  the  situation  based  on  information  contained  within  the  knowledge  base. 


RESULTS  AND  DISCUSSION 

The  system  is  stocked  with  declarative  and  procedural  knowledge  from  the  Standard  Operating  Practices  as 
well  as  new  knowledge  from  scenario  "role  playing"  using  the  UBC  transient  furnace  model.  This 
knowledge  forms  the  basis  for  the  expert  system  in  terms  of  providing  an  operator  with  knowledge  gathered 
and  refined  from  years  of  experience  and  from  recent  scenario  development.  When  the  knowledge  base  is 
applied  to  complex  delay  situations,  the  system  can  offer  timely  advice  for  a  myriad  of  transient  conditions 

The  system  records  and  archives  time-stamped  temperature  profiles  of  the  billets  for  scrutinization. 
Furnace  operating  data  was  also  made  available  to  the  modeling  experts.  The  zone  temperature  records  can 
be  correlated  with  the  billet  temperature  data.  In  addition  to  these  traces,  gas  flows,  air  flows,  and  total 
furnace  pressure  stemming  from  operator  control  of  the  PLCs  were  also  available  for  analysis.  An  example 
of  a  billet  temperature  profile  is  illustrated  in  Figure  2.  This  example  illustrates  the  change  in  temperature 
profile  which  can  occur  during  a  short  delay.  The  dip  in  temperature  of  the  billets  is  due  to  the  change  in 
thermal  history  prior  to  discharge  as  a  result  of  operator  response  and  manipulation  of  burner  controls. 

These  responses  can  then  be  analyzed  via  a  3D  mathematical  model  to  determine  the  efficiency  of  the 
operator's  response  as  well  as  the  possibility  of  creating  better  response  without  needing  to  experiment  with 
the  actual  furnace  controls.  The  analysis  can  then  be  expanded  to  include  theoretical  responses  to  incorrect 
initial  responses  based  on  inaccurate  initial  operator  delay  estimates.  This  information  can  then  be  placed 
into  the  expert  system  to  provide  timely  new  advice  for  an  initially  misdiagnosed  delay  progresses. 


116 


This  example  contains  a 
7.5  minute  delay  midway 
along  the  time  sequence. 


Fig.  3.  Sequence  Plot  of  Billet  Temperature  Profiles. 

During  the  final  phase  of  furnace  modeling,  we  were  able  to  create  some  general  response  scenarios  with 
respect  to  nominal  operator  control  as  well  as  developed  control  schemes  of  varying  complexity.  Several 
new  procedures  were  postulated  and  tested  for  a  number  of  variations.  The  result  was  a  set  of  new  post 
delay  procedures  which  the  model  deemed  to  be  more  effective.  We  assembled  a  set  of  results  from  the 
modeling  efforts  which  included  examination  of  the  effectiveness  of  three  delay  strategies  as  follows: 


1 .  null  strategy  -  no  action  is  taken  during  or  after  this  delay 

2.  basic  strategy  -  as  is  currently  prescribed 

3.  fine  strategy  -  fine  adjustments  to  the  operation  during  and  after  the  delay 


The  firing  strategies  noted  above  are  summarized  in  Table  3  and  Table  4  [3],  The  modeled  outcomes  of 
these  strategies  are  summarized  in  Figure  4  and  Figure  5  [3],  It  is  clear  from  these  results  that  the  null 
strategy  as  expected  produces  greatly  overheated  billets  and  that  the  fine  control  strategy  produced  billets 
with  much  lower  standard  deviation  in  temperature.  These  procedures  were  presented  to  the  mill  for 
validation  prior  to  assimilation  into  the  mill  SOP. 


Table  3.  Basic  Post-Delay  Firing  Strategy. 


B' 

BSSmSB 

delay  (70  min) 

280 

280 

280 

1st  10  min.  after  delay  (70-80  min) 

280 

280 

150 

next  20  min.  after  delay  (80-100  min) 

1850 

280 

150 

next  5  min  after  delay  (100-105  min) 

1850 

280 

400 

steady  state 

1850 

900 

400 

Table  4.  Fine  Post-Delay  Firing  Strategy. 


Charge  Zone 
rNm3/hl 

Heat  Zone 
[Nm3/h] 

Soak  Zone 
[Nm3/h] 

delay  (70  min) 

280 

280 

280 

1st  10  min.  after  delay  (70-80  min) 

280 

280 

150 

2nd  10  min.  after  delay  (80-90  min) 

1850 

280 

200 

3rd  10  min.  after  delay  (90-100  min) 

1850 

280 

200 

4th  10  min.  after  delay  (100-105  min) 

1850 

280 

300 

105-140  min 

1850 

900 

300 

140-150  min 

1850 

900 

400 

150-155  min 

1850 

900 

600 

steady  state 

1850 

900 

400 

117 


Figure  5  Comparison  of  a  Basic  Fine  Control  Strategy  with  a  Null  Strategy. 

It  had  been  believed  that  about  40%  of  the  production  is  constrained  by  rolling  mill  delays,  leaving  the 
remaining  60%  of  the  billets  in  the  furnace  at  full  burner  capacity.  Logged  furnace  data  used  for  model 
development  showed  that  this  was  not  that  case  since  not  all  of  the  furnace  zones  were  maintained  under 
heavy  firing  conditions  with  large  8"  billets.  In  fact,  the  manner  in  which  the  furnace  was  operated  may 
have  affected  throughput.  Further  examination  of  these  procedures  is  required  to  assess  the  feasibility  of 
increasing  furnace  throughput  by  altering  the  ingrained  operating  procedures  with  this  new  knowledge.  It  is 
clear  from  this  example  of  mathematically-modeled  delay  response  strategies,  that  the  expert  system  can  be 
more  effective  with  addition  of  externally  generated  knowledge. 

The  initial  response  to  Phase  I  of  the  installation  showed  great  reluctance  on  the  part  of  the  operators  as 
they  were  suspicious  that  mill  management  intended  to  replace  furnace  operators  with  a  computer  control 
system.  This  is  consistent  with  the  "Feigenbaum  Bottleneck"  which  includes  reluctance  from  experts  due  to 
the  fear  of  "loss  of  work"  from  AI  technology  and  automation.  [2,9].  These  fears  were  alleviated  by 
carefully  demonstrating  and  explaining  the  system  to  the  operators  to  clarify  our  objectives  and  to  explain 
that  our  goal  was  to  supplement  the  operators'  "toolbox"  with  on-line  presentation  of  the  appropriate 
operating  procedures.  Following  these  discussions,  this  initial  phase  was  successfully  completed. 

Phase  II  was  essentially  a  seamless  addition  to  the  interface.  All  interface  screens  from  Phase  I  remained 
intact  with  the  addition  of  an  advisory  screen  to  provide  on-line  advice,  based  on  the  knowledge  within  the 
newly  installed  knowledge  base  and  delay-detection  rules.  As  expected,  this  phase  was  the  portion  of  the 
project  most  difficult  to  obtain  general  acceptance  from  furnace  operators.  This  reluctance  was  due  mainly 
to  the  past  nature  of  furnace  control  in  which  operators  were  allowed  to  implement  any  control  procedure  in 
order  to  produce  billets  with  minimal  mill  delay.  Unfortunately,  these  decisions  often  produced  billets  at  the 
expense  of  product  quality  or  total  overall  production.  However,  since  the  expert  system  goals  are  being 
championed  by  the  supervising  combustion  engineer,  this  reluctance  will  eventually  be  overcome. 


CONCLUSION 

This  project  has  examined  problems  involved  in  controlling  a  steel  reheat  furnace  at  an  Alberta  mini-mill. 
On  completion  of  the  work,  the  following  conclusions  can  be  made: 


118 


•  Control  of  billet  reheating  furnaces  has  proven  to  be  well-suited  to  an  expert  system  approach  due  to 
the  heuristic  nature  of  the  process. 

•  An  Expert  System  consisting  of  a  data  collection  module  and  advisory  module  was  developed  and 
successfully  installed  at  an  Alberta  mill.  The  system  is  still  in  operation  3  years  after  installation.  This 
implementation  can  be  considered  a  success  because  of  the  continued  use  by  the  mill  personnel. 

•  An  on-line  and  off-line  statistical  process  database  was  created  for  forensic  analysis  of  operations. 

•  The  static  and  transient  furnace  models  developed  during  this  project  were  effectively  used  to  study 
control  conditions  and  procedures  leading  to  new  knowledge  which  increased  mill  efficiency.  This 
knowledge  was  assimilated  into  the  SCADA  system  for  on-line  use. 

•  The  system  allows  creation  of  an  operating  standard  in  which  inexperienced  or  undisciplined  operators 
must  use  proven  procedures  designed  to  increase  productivity  and  decrease  operating  losses. 

•  The  system  has  become  an  integral  tool  for  operators  who  benefit  from  a  man-machine  interface  as  well 
as  the  mill  managers  as  they  attempt  to  improve  the  operating  bottom  line  via  process  efficiency. 


ACKNOWLEDGEMENTS 

CM  would  like  to  express  his  appreciation  to  Dr.  Peter  Barr  and  Dr.  John  Meech  for  their  support  and 
guidance  towards  the  completion  of  this  project  and  to  thank  the  late  Dr.  J.  Keith  Brimacombe,  Dr.  Indira 
Samarasekera,  the  Centre  for  Metallurgical  Process  Engineering  and  NSERC  for  financial  and  moral 
support  in  this  endeavor.  A  debt  of  gratitude  to  Vladimir  Rakocevic  is  expressed  for  his  invaluable 
assistance  in  creating  the  hardware  drivers.  Ken  Scholey  and  Edmund  Osinski  are  acknowledged  for  their 
modeling  work  as  well  as  the  assistance  of  personnel  at  Alta  Steel:  Bob  Pugh,  Doug  Ostafichuk,  Dennis 
Gutknecht,  Mark  Burrough,  Ed  Duchesne  and  the  furnace  operators. 

We  wish  to  acknowledge  the  financial  and  in-kind  support  of  the  following  sponsors  of  this  research:  Alta 
Steel,  Manitoba  Rolling  Mills,  Accumold  Ltd.,  Comdale  Technologies  (Canada)  Inc.,  and  the  Natural 
Sciences  and  Engineering  Research  Council  of  Canada. 


REFERENCES 

1.  C.L.B.Mui,  1998.  Thesis.  Steel  Billet  Reheating:  An  Expert  Approach.  UBC  MASc. 

2.  J.Efstathiou,  1989.  Expert  Systems  in  Process  Control,  1 14-138. 

3.  P.V.Barr,  K.E. Scholey,  E.Osinski,  1997.  Unpublished  work  in  progress,  Modeling  of  Steady-State  and 
Transient  Billet  Furnace  Operation. 

4.  K.E.Scholey,  1996.  A  Transient,  Three-Dimensional,  Thermal  Model  of  a  Billet  Reheating  Furnace. 

UBC  PhD  Thesis, 

5.  L.BoulIert,  AKrigsman,  R.A.  Vingerhoeds,  1992.  Application  of  Artificial  Intelligence  in  Process 
Control:  Lecture  Notes  Erasmus  Intensive  Course,  127-141 

6.  J.K.Brimacombe,  1993.  Towards  the  Intelligent  Mould  for  the  Continuous  Casting  of  Steel  Billets.  76th 
Steelmaking  Conference,  Iron  and  Steel  Society,  Empowerment  with  Knowledge:,  3-27. 

7.  V.Rakocevic,  J.A.Meech,  1995.  Computational  Intelligence  in  a  real-time  SCADA  system  to  monitor  and 
control  continuous  casting  of  steel  billets.  IFSA  '95  Sao  Paulo,  Brazil,  July  22-28, 

8.  J.C.Bezdek,  1994.  Computational  Intelligence  -Imitating  life.  IEEE  World  Congress  on  Computational 
Intelligence  (WCCI),  What  is  computational  intelligence?,  1-12 

9.  J.A.Meech,  1995.  AI  Applications  in  the  Mining  Industry  into  the  21st  Century,  APCOM  XXV 
Conference,  Brisbane  May  9-14,  95-96. 


119 


Simulation  and  Analysis  of  Thin  Strip  Casting  Processes 

Yogeshwar  Sahai  and  Manish  Gupta 

Department  of  Materials  Science  and  Engineering 
The  Ohio  State  University,  Columbus  OH,  USA. 

ABSTRACT 

An  overview  of  the  thin  strip  casting  processes  for  steel  and  aluminum  is  presented  in  this  work.  A  two- 
dimensional  finite  element  mathematical  model,  capable  of  simulating  the  turbulent  fluid  flow,  heat  transfer 
and  solidification,  and  thermally  induced  stresses,  is  presented. 

INTRODUCTION 

Thin  strip  casting  processes  for  casting  molten  metal  directly  into  thin  strips  of  desired  thickness  and  width 
are  currently  being  developed  around  the  world.  These  processes  have  the  following  advantages  over 
conventional  continuous  casting  processes: 

1)  Energy  savings  in  heating  and  deformation  in  hot  and  cold  rolling. 

2)  Reduced  production  time,  which  increases  the  efficiency  of  the  process. 

3)  Reduced  segregation  due  to  decreased  solidification  time. 

4)  Refined  microstructure  due  to  faster  cooling. 

Three  thin  strip  casting  processes  have  shown  the  potential  for  industrial  production  of  thin  strips  of  steel. 
These  are  melt  drag  process,  twin-roll  process,  and  two-roll  melt  drag  method. 


Fig.  1.  Schematic  of  a  two-roll  melt  drag  thin  strip  caster. 

In  melt  drag  process  only  one  side  of  the  strip  solidifies  on  the  roller  while  the  other  side  solidifies  in  the 
open  atmosphere  and  thus  results  in  poor  surface  finish.  In  order  to  get  better  surface  finish  on  both  sides, 
the  twin-roller  caster  is  used.  In  this  process,  both  sides  of  strip  solidify  in  contact  with  roller  surface.  In 
two-roll  melt  drag  process,  as  shown  in  Fig.  1,  a  roller  of  small  or  equal  diameter  is  placed  on  the  top  free 
surface  which  improves  the  quality  of  the  top  cast  surface.  Generally,  the  melt  drag  process  is  most  suitable 
for  casting  strips  upto  2  mm  thickness,  while  the  two-roller  melt  drag  and  twin-roll  strip  casting  processes 
are  preferable  for  casting  of  strips  upto  7  mm.  Fig.  2  shows  the  typical  velocity  vectors  in  the  melt  pool  and 
rolls  for  a  two-roll  melt  drag  caster. 

For  industrial  production  of  thin  strips  of  Aluminum,  twin  roll  casters  have  proven  to  be  the  most 
economical  and  efficient  machines.  In  a  horizontal  type  twin  roll  caster,  molten  aluminum  is  fed  from  a 
refractory  feed  tip  into  the  gap  between  two  counter  rotating,  water-cooled  cylindrical  steel  rolls.  The 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


120 


schematic  of  this  caster  is  shown  in  Fig.  3.  Unlike  thin  strip  casting  of  steel,  the  solidification  gets  over  well 
before  the  kissing  point  during  casting  of  aluminum  sheets  and  the  material  undergoes  a  considerable 
amount  of  rolling  before  it  leaves  the  rolls.  Also,  there  is  relative  velocity  between  the  cast  strip  and  the 
rolls,  which  results  in  shear  stresses  at  the  metal/roll  interface 


Fig.  2.  Typical  velocity  vectors  in  the  melt  pool  and  rolls  for  a  two-roll  melt  drag  thin  strip  caster. 


Fig.  3.  Schematic  of  a  horizontal  type  twin-roll  thin  strip  aluminum  caster. 

A  significant  amount  of  work  has  been  done  by  Gupta  and  Sahai  [1]  to  mathematically  model  these  thin 
strip  casting  processes.  In  their  work,  they  simulated  melt  drag,  twin  roll  and  two  roll  melt  drag  thin  strip 
casting  processes. 

Strip  casting  process  involves  solidification  of  liquid  metal  and,  in  case  of  aluminum,  solidified  metal 
undergoes  considerable  amount  of  rolling  before  it  leaves  the  kissing  point.  Solidification  of  molten  metal 
starts  at  the  point  of  first  metal-roll  contact  and  is  over  at  or  before  the  kissing  point.  During  the  process, 
molten  metal  experiences  a  very  high  rate  of  heat  extraction,  which  results  in  very  high  thermal  stresses  in 
the  material.  These  stresses  arise  due  to  thermal  gradients  in  the  material,  rolling  action,  metallostatic 
pressure  of  the  unsolidified  melt  pool  and  friction  between  roller  and  strip.  These  stresses  can  have  a 
deleterious  effect  on  the  quality  of  the  cast  product.  Stresses,  tensile  in  nature  can  lead  to  the  formation  of  a 
variety  of  surface  and  internal  cracks.  Hence,  it  is  of  great  importance  to  understand  the  development  and 
nature  of  thermal  stresses.  The  modeling  of  stresses  in  thin  strip  casting  processes  has  received  very  little 


121 


attention  of  the  researchers  in  past.  During  the  thin  strip  casting  of  aluminum,  the  major  problems  in  the 
industry  today  are  the  various  defects  in  the  cast  product.  Various  casting  defects  like  centerline 
segregation  [2],  heat  line  formation  [3],  and  sticking  problems  occur  during  strip  casting. 

This  paper  presents  some  results  of  two-dimensional  mathematical  model  of  heat  transfer,  turbulent  fluid 
flow,  solidification  and  thermally  induced  stress  simulation.  Two  casting  processes,  two-roll  melt  drag 
method  for  thin  strip  casting  of  steels  and  a  horizontal  type  twin-roll  casting  process  for  aluminum,  are 
presented  in  detail.  In  aluminum  strip  casting,  it  has  been  observed  that  the  extent  of  centerline  segregation 
increases  with  the  length  of  solidification  interval,  or  the  sump  depth.  Effect  of  process  variables  on  the 
sump  depth  and  thermal  stresses  is  presented  for  a  typical  twin-roll  aluminum  caster.  For  two-roll  melt  drag 
casting  of  steel,  effect  of  process  variables  on  the  cast  strip  thickness,  flow  profile  and  thermal  stresses  is 
presented.  Fluid  flow  and  heat  transfer  calculations  are  performed  using  a  commercial  software,  FIDAP, 
and  the  temperature  profiles  obtained  at  steady  state  are  imposed  as  thermal  load  to  the  mechanical  system. 
Stress  analysis  calculations  are  performed  using  another  commercial  software,  ANSYS.  The  stress  analysis 
model  uses  viscoplastic  constitutive  equation  to  describe  the  material  behavior  at  temperatures  close  to 
melting  point. 

MATHEMATICAL  MODEL 

A  general  mathematical  model  is  described  here  and  its  details  can  be  found  elsewhere  [4].  This  model  is 
capable  of  simulating  all  thin  strip  casting  processes  for  steel  and  aluminum,  with  some  differences,  such  as 
velocity  and  load  boundary  conditions. 

The  mathematical  model  involves  the  following  assumptions: 

1)  The  process  is  at  steady  state.  After  a  small  initial  transient  period,  process  parameters  do  not  change 
with  time. 

2)  By  taking  into  account  the  fact  that  width/thickness  ratio  is  very  large  and  ignoring  the  end  effects,  the 
geometry  of  the  process  can  be  approximated  as  two-dimensional. 

3)  Liquid  metal  is  incompressible  and  is  a  Newtonian  fluid. 

4)  Material  is  assumed  to  be  isotropic  in  nature. 

5)  The  elastic  strains  are  small  relative  to  plastic  strains. 

6)  The  plastic  flow  of  material  is  assumed  to  be  isochoric. 

7)  Material  properties  (except  viscosity  and  specific  heat)  are  temperature  independent. 

8)  Heat  losses  by  radiation  are  negligible. 

Under  the  above  assumptions,  the  following  equations  are  solved  in  the  calculation  domain, 

•  Continuity  equation 

•  Turbulent  Navier-Stokes  equation 

•  Equation  for  conservation  of  thermal  energy 

•  Two  equations  for  k  and  £  to  model  turbulence. 

•  Equation  for  mechanical  equilibrium 

The  above  equations  are  solved  with  appropriate  boundary  conditions.  The  strip  velocity  in  aluminum  thin 
strip  caster  is  about  10%  higher  than  the  rolls. 

RESULTS  AND  DISCUSSION 

Two-roll  melt  drag  process  for  strip  casting  of  steels 

Process  variables  that  affect  the  thickness  of  cast  strip  significantly  were  identified  as: 

(i)  Roller  speed 

(ii)  Gap  heat  transfer  coefficient 

(iii)  Melt  superheat 

Effect  of  roller  speed  on  cast  strip  thickness 

Fig.  4  shows  the  variation  in  strip  thickness  with  roller  speed.  It  can  be  seen  that  the  strip  thickness  decreases 
with  increase  in  roller  speed.  This  is  because,  at  higher  casting  speed  contact  time  of  melt  with  roller  is  less, 


122 


hence  the  time  provided  for  solidification  of  liquid  metal  is  also  less,  which  results  in  thinner  strips.  From 
Fig.  4,  it  also  evident  that  the  growth  rate  for  low  roller  speed  is  higher  than  that  for  high  roller  speeds. 


y  =  0.8764x°'6754 
R2  =  0.9768 


♦  strip  thickness 
— —  Power  (strip  thickness)  1 


Fig.4.  Strip  thickness  as  a  function  of  roller  speed. 


Effect  of  gap  heat  transfer  coefficient  on  cast  strip  thickness 

Gap  heat  transfer  coefficient  governs  the  rate  of  heat  transfer  from  melt  pool  to  roller.  Fig.  5  shows  the 
variation  in  strip  thickness  with  gap  heat  transfer  coefficient.  It  can  be  seen  from  the  figure  that  as  the  value 
of  gap  heat  transfer  coefficient  increases  thickness  of  cast  strip  increases. 


Fig.  5.  Strip  thickness  as  a  function  of  gap  heat  transfer  coefficient. 

Effect  of  melt  superheat  on  cast  strip  thickness 

Fig.  6  shows  the  variation  in  strip  thickness  with  the  melt  superheat.  Larger  amount  of  heat  is  to  be 
removed  through  the  solidifying  strip  in  case  of  liquid  steel  with  higher  superheat.  This,  in  turn  reduces  the 
solidification  rate.  It  can  be  seen  that,  increase  in  superheat  of  liquid  steel  reduces  the  solidified  thickness 
of  the  shell.  The  effect  of  superheat  of  strip  thickness  does  not  seem  to  be  significant  because  the  major  part 


123 


of  heat  is  coming  from  the  release  of  latent  heat  of  fusion  and  not  from  the  superheat.  Sensible  heat  due  to 
superheat  is  very  less  compared  to  latent  heat  of  fusion. 


y  =  -0.0022x+  1.1 
R2  =  0.9878 


|  ♦  strip  thickness 

| - Linear  (strip 

thickness) 


Fig.  6.  Strip  thickness  as  a  function  of  melt  superheat. 

Stress  models  predictions 

The  stress  model  can  be  used  to  evaluate  conditions  under  which  cracks  may  appear  in  the  solidifying 
body.  In  this  study,  the  cracking  criterion  proposed  by  Ramacciotti  [5]  is  employed,  in  which  a  temperature 
dependent  ultimate  strength  is  used  as  a  reference  for  cracking  criterion: 

Or  =  C(Tsol-T)°'s(MPa)  1. 

where,  C  =  1.2'0'5  (MPa  K )'° 5. 

A  cracking  index  was  defined  based  on  the  principal  stress  as  C.I.  =  (Op/aR)  max  where,  cp  is  the  magnitude 
of  the  principal  stress. 

A  positive  cracking  index  at  a  point  means  that  the  material  is  subjected  to  tensile  stresses  at  that  point 
whereas  a  negative  cracking  index  would  mean  the  material  is  subjected  to  compressive  stresses.  Cracks 
can  appear  at  points  with  C.I.  greater  than  one,  i.e.  the  points  subjected  to  tensile  stresses,  while  cracks  will 
not  appear  at  the  points  under  the  compressive  stresses. 

Fig.  7  represents  the  variation  in  the  cracking  index,  across  the  width  of  cast  strip  at  the  point  where  it 
leaves  the  rolls,  in  a  two-roll  melt  drag  thin  strip  casting  under  three  different  casting  conditions.  In  the 
finite  element  mesh,  there  are  45  nodes  along  the  width  of  the  strip.  Point  1  represents  the  node  at  the  point 
of  contact  between  the  strip  and  upper  roller  and  point  45  represents  the  node  at  the  point  of  contact 
between  the  cast  strip  and  lower  roller.  It  is  evident  from  Fig.7  that  the  susceptibility  of  material  to  fail  due 
to  thermal  stresses  is  highest  at  points,  which  are  close  to  two  rolls  and  is  lowest  at  locations  in  the  middle 
of  the  cast  strip.  The  magnitude  of  thermal  stresses  in  the  cast  strip  increases  as  the  casting  speed  decreases. 
This  is  due  to  the  fact  that  at  low  casting  speed,  shell  thickness  of  solidified  metal  on  the  rolls  is  more 
which  reduces  the  rate  of  heat  transfer  from  the  melt  pool  to  the  rolls  (effective  thermal  conductivity  of 
liquid  metal  is  much  higher  than  that  of  solidified  metal).  This  leads  to  higher  thermal  gradients  in  the 
material  and  consequently  higher  thermal  stresses  are  generated. 

Horizontal  type  twin-roll  thin  strip  casting  of  aluminum 

The  cause  of  centerline  segregation  is  the  interdendritic  fluid  motion  through  the  partly  solid  region,  which 
arises  because  of  the  pressure  gradient  in  the  roll  gap.  The  intensity  of  segregation  or  the  liquid  motion 
between  the  dendritic  arms  is  influenced  by  the  local  solidification  time,  which  is  defined  as  the  time  at  a 


124 


given  location  in  a  casting  between  initiation  and  completion  of  solidification.  The  length  of  solidification 
interval  or  the  sump  depth  increases  as  solidification  time  increases.  Sump  depth  is  a  function  of  alloy 
freezing  range,  casting  speed,  strip/roll  gap  heat  transfer  coefficient  and  inlet  temperature  of  liquid  metal. 


Cracking  index  across  the  strip  (at  the  kissing  point) 


- solidification  over  before 

the  kissing  pt 
■  —  solidification  over  just 
before  the  kissing  point 

- solidification  over  after  the 

kissing  point 


Fig.  7.  Variation  in  cracking  index  across  the  cast  strip  in  a  two-roll  melt  drag  thin  strip  caster. 

The  liquidus  and  solidus  temperatures  of  the  alloy  are  66 (fC  and  640°C  respectively.  It  is  assumed  that  the 
alloy  behaves  as  solid  at  temperatures  above  the  mean  of  liquidus  and  solidus  temperature,  which  is  65CPC, 
and  as  liquid  at  temperatures  below  650°C. 

The  position  of  the  solid/liquid  interface  is  shown  in  Fig.  8.  The  distance  between  the  nozzle  and  the 
completion  of  solidification  is  defined  as  the  sump  depth.  Fig.  9,  1 0,  and  1 1 ,  show  the  effect  of  casting 
speed,  slab/roll  gap  heat  transfer  coefficient  and  melt  superheat,  respectively  on  the  sump  depth. 


Fig.  8.  Solid-liquid  interface  in  the  melt  pool  for  a  twin-roll  aluminum  thin  strip  caster. 

Of  these  parameters,  variation  in  casting  speed  has  the  largest  influence  on  the  process.  Higher  casting 
speed  gives  less  time  for  solidification  and  hence  allows  greater  motion  of  liquid  between  the  dendritic 
arms.  From  Fig.  9,  it  can  be  seen  that  for  casting  speed  of  5mm/sec,  the  sump  depth  is  around  18  mm  and 


125 


this  becomes  35  mm  for  casting  speed  of  25  mm/sec.  This  effect  of  casting  speed  seriously  limits  the  strip 
casting  process  to  relatively  short  freezing  range  alloys  and  relatively  low  production  rates. 


Sump  depth  vs.  Roller  speed 


Fig.  9.  Sump  depth  as  a  function  of  roller  speed. 

The  gap  heat  transfer  coefficient  controls  the  removal  of  heat  from  the  aluminum  strip  into  the  steel  roll. 
Fig.  10,  shows  that  below  ~  7000  kW/m2  K,  changes  in  the  heat  transfer  coefficient  due  to  non-uniformity 
of  the  roll  surface  for  example,  can  significantly  affect  sump  depth.  This  may  be  responsible  for  the 
periodic  variation  in  the  amount  of  centerline  segregation,  which  is  often  seen  in  the  cast  strip. 


Fig.  10.  Sump  depth  as  a  function  of  gap  heat  transfer  coefficient. 

Fig.  1 1  shows  the  increase  in  sump  depth  with  increases  in  melt  superheat.  The  sump  depth  increases  with 
superheat  because  the  required  time  for  molten  metal  to  solidify  increases  with  the  superheat  in  the  metal. 
But  this  effect  is  not  very  significant,  as  the  heat  released  from  solidification  is  much  higher  than  the  heat 
due  to  superheat  in  the  metal. 


126 


It  is  evident  that  a  long  solidification  interval  and  hence,  strong  segregation  can  be  expected  for  high 
casting  speed,  low  heat  transfer  rate  and  high  melt  superheat.  This  is  supported  by  experimental  studies  on 
centerline  segregation  in  Al-Mg  and  Al-Fe  alloys  by  Jin  et  al.  [2],  They  studied  the  appearance  of  centerline 
segregates  in  Al-4.0%Mg-1.0%Cu  alloy  during  twin  roll  strip  casting  and  observed  centerline  segregation 
increasing  as  solute  content  increased  at  a  given  speed  and  as  casting  speed  increased  for  a  given  alloy. 


Sump  depth  vs  Melt  superheat 


Fig.  11.  Sump  depth  as  a  Junction  of  melt  superheat. 


During  the  course  of  thin  strip  casting,  stresses  develop  in  the  solidified  material  primarily  due  to  the 
thermal  gradient  which  exist  in  the  solid  metal  and  the  pressure  exerted  by  the  rolls.  Fig.  12  shows  the 
maximum  principal  stress  across  the  cast  strip,  for  different  casting  speeds,  at  the  point  where  strip  leaves 
the  rolls.  Surface  of  the  strip  experiences  maximum  stress  due  to  the  roll  force  and  chilling  action  of  rolls. 


Maxmum  principal  stress  across  the  cast  strip  as 
a  function  of  casting  speed 


•  casting  speed 
5mm/sec 

casting  speed 
1 8mm/sec 

casting  speed 
20mm/sec 

■  casting  speed  |  j 
25mm/sec  I : 


Fig.  12.  Maximum  principal  stress  across  the  strip  for  different  casting  conditions. 


It  can  be  seen  that  as  the  casting  speed  increases,  magnitude  of  stresses  in  the  solidified  material  also 
increases.  This  is  due  to  the  fact  that  at  higher  casting  speeds,  contact  time  of  metal  with  rolls  is  less  hence 
it  gets  less  time  for  homogenization  of  temperature.  Consequently,  high  temperature  gradients  exist  in  the 
material  when  the  casting  speed  is  high,  resulting  in  higher  stresses  in  the  material. 


127 


SUMMARY 

In  this  paper,  an  overview  of  the  thin  strip  casting  processes  for  steel  and  aluminum  is  presented.  More 
detailed  analysis  of  two  casting  processes,  one  for  steel  and  one  for  aluminum  is  given.  A  two-dimensional 
mathematical  model  is  presented  to  simulate  the  turbulent  fluid  flow,  heat  transfer  and  solidification  in  two 
roll  melt  drag  process.  Relationship  between  various  operating  parameters  like  roller  speed,  gap  heat  transfer 
coefficient  and  melt  superheat  and  cast  strip  thickness  was  obtained  from  the  model. 

From  the  stress  analysis  of  the  two-roll  melt  drag  process  for  thin  strip  casting  of  steels  it  can  be  concluded 
that  the  position  of  solid-liquid  interface  in  the  melt  pool  determines  the  magnitude  of  thermally  induced 
stresses  in  the  material.  Hence,  operating  parameters  such  as  casting  speed,  temperature  or  roll  coolant, 
melt  superheat  and  gap  heat  transfer  coefficient  should  be  strictly  controlled  to  avoid  situations  where 
solidification  gets  over  much  before  or  much  after  the  kissing  point. 

During  thin  strip  casting  of  aluminum  using  horizontal  type  twin-roll  caster,  the  extent  of  centerline 
segregation  and  thermally  induced  stresses  increases  with  process  parameters  like  casting  speed  and  melt 
superheat  and  decreases  for  higher  heat  transfer  rate  across  slab/roll  interface.  Hence  a  low  casting  speed 
would  be  desirable  for  smooth  casting  operation  but  that  would  reduce  the  productivity.  High  productivity 
can  be  achieved  by  increasing  the  casting  speed  and  offsetting  the  effect  of  higher  casting  speed  by  insuring 
a  higher  rate  of  heat  withdrawal  from  the  melt  pool  through  the  rollers. 


REFERENCES 

1.  Gupta,  Shailesh,  1998.  Mathematical  modeling  of  thin  strip  casting  processes,  M.S.  thesis,  Department 
of  Materials  Science  and  Engineering,  Ohio  State  University,  Columbus,  Ohio. 

2.  Jin,  I.,  Morris,  L.R!,  Hunt,  J.D.,  1982.  Light  Metals,  TMS,  p.873. 

3.  Bagshaw,  M.J.,  Hunt,  J.D.,  Jordan,  R.M.,  1988.  Cast  Metals,  1(1),  p.16. 

4  Gupta,  Manish,  1999.  M.S.  thesis,  Department  of  Materials  Science  and  Engineering,  Ohio  State 
University,  Columbus,  Ohio. 

5.  Ramacciotti,  A,  1988.  Steel  Research.  59(10),  438-448. 


128 


Intelligent  Manufacturing  I 


130 


131 


Agent-based  Control  of  Manufacturing  Systems 

L.  Monostori,  B.  Kadar 

Computer  and  Automation  Research  Institute,  Hungarian  Academy  of  Sciences 
Kende  u.  13-17,  Budapest,  POB  63,  H-1518,  Hungary 


ABSTRACT 

Management  of  complexity,  changes  and  disturbances  is  one  of  the  key  issues  in  production  today. 
Distributed,  agent-based  (holonic)  structures  represent  viable  alternatives  to  hierarchical  systems  provided 
with  reactive  /  proactive  capabilities.  The  paper  outlines  the  difficulties  which  hinder  their  industrial 
acceptance.  Several  approaches  to  overcome  these  barriers  are  introduced,  i.e.  the  use  of  simulation 
techniques  for  developing  and  testing  agent-based  control  architectures,  the  holonification  of  existing 
resources  and  traditional  (centralized  /  hierarchical)  manufacturing  systems.  Finally,  the  cooperative  use  of 
agent-based  distributed  control  structures  and  the  more  centralized  (e.g.  GA-based)  schedulers  is  proposed 
aiming  at  systems  which  can  handle  critical  complexity,  reactivity,  disturbance  and  optimality  issues  at  the 
same  time. 


INTRODUCTION 

Attempts  were  made  to  develop  novel  manufacturing  architectures  that  can  deal  with  the  problems  indicated 
in  the  abstract,  i.e.  growing  complexity,  changes,  disturbances,  uncertainties.  The  most  known  approaches 
are  as  follows:  fractal  manufacturing  [1],  bionic  manufacturing  [2,  3],  random  manufacturing  [4],  and 
holonic  manufacturing  [5]. 

Over  the  past  years  significant  research  efforts  have  been  devoted  to  the  development  and  use  of 
Distributed  Artificial  Intelligence  ( DAI)  techniques  (e.g.  [6]).  An  agent  is  a  real  or  virtual  entity  able  to  act 
on  itself  and  on  the  surrounding  world,  generally  populated  by  other  agents.  Its  behavior  is  based  on  its 
observations,  knowledge  and  interactions  with  the  world  of  other  agents.  An  agent  has  several  inportant 
features.  It  has  capabilities  of  perception  and  a  partial  representation  of  the  environment,  can  communicate 
with  other  agents,  can  reproduce  child  agents,  and  has  own  objectives  and  an  autonomous  behavior  [7].  A 
multi-agent  system  (MAS)  is  an  artificial  system  composed  of  a  population  of  autonomous  agents,  which 
cooperate  with  each  other  to  reach  common  goals,  while  simultaneously  pursuing  individual  objectives  [7], 

Holonic  manufacturing  systems  (HMSs),  as  one  of  the  new  paradigms  in  manufacturing,  consist  of 
autonomous,  intelligent,  flexible,  distributed,  cooperative  agents  or  holons  [5],  Three  types  of  basic  holons, 
namely  resource  holons,  product  holons  and  order  holons,  together  with  the  main  information  flows 
between  them  are  defined  in  [8],  These  basic  entities  are  structured  by  using  object-oriented  concepts  such 
as  aggregation  and  specialization.  Staff  holons  are  also  foreseen  to  assist  the  basic  holons  in  performing 
their  work.  Reference  architecture  for  holonic  manufacturing  systems  is  given  in  [8].  Other  authors  refer 
only  to  two  types  of  basic  building  blocks,  e.g.  order  and  machine  agents  in  [9],  job  and  resource  agents  in 
[10]  order  and  machine  (resource)  holons  in  [11],  A  common  feature  of  these  approaches  is  that  the 
functions  of  the  order  and  product  holons  are  somehow  integrated  in  one  basic  type.  One  of  the  most 
promising  features  of  the  holonic  approach  is  that  it  represents  a  transition  between  fully  hierarchical  and 
heterarchical  systems  [5,  12], 

The  main  questions  concerning  agent-based  manufacturing  architectures  are  as  follows  [11]: 

-  Agents  ’structure:  internal  structure  of  agents  and  the  level  of  their  self-containment. 

-  Communication:  communication  protocol,  common  interchange  language. 

-  Group  formation:  negotiation,  suitable  communication  protocol,  persuasion  of  machines  to 
participate  in  a  group,  reward  /  penalty  systems,  market  mechanisms,  agents’ objective  function. 

-  Reconfigurability :  system  openness  (machine  addition,  deletion,  substitution). 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


132 


-  Scalability:  potential  appropriateness  for  scaling  up  to  the  level  of  the  extended  enterprise  [13]. 

-  Global  versus  local  optima:  how  to  reach  global  optima  with  selfish  agents  pursuing  their  own  goals; 
which  is  the  optimal  ratio  of  hierarchy  and  heterarchy  in  a  given  situation;  how  to  tune  the  system; 
how  to  accomplish  learning  at  system  and  agent  levels? 

Distributed,  holonic-like  systems  represent  viable  alternatives  to  hierarchical  and  heterarchical  strictures 
and  the  corresponding  reactive  scheduling  approaches  [14],  The  industrial  acceptance  of  holonics,  however, 
is  relatively  low  among  others  things  by  reasons  of 

-  the  relative  crudeness  of  the  agent  theory  and  its  manufacturing  applications, 

-  the  insufficient  communication  and  decision  making  capabilities  of  present  numerical  controls, 

-  the  high  investment  costs  of  a  production  system  working  according  to  the  agent  principles, 

-  the  seemingly  insurmountable  difficulties  in  their  stepwise  integration  into  existing  production 
systems  [15], 

Several  approaches  are  introduced  and  treated  in  the  paper  to  overcome  the  above  difficulties: 

-  the  use  of  the  simulation  technique  for  the  development  of  agent-based  control  architectures, 
the  holonification  of  existing  resources, 

-  the  holonification  of  traditional  systems  by  using  the  virtual  manufacturing  ( VM)  concept. 


DEVELOPMENT  OF  AGENT-BASED  ARCHITECTURES  BY  SIMULATION 

Simulation  is  usually  an  efficient  technique  to  make  difficult  problems  more  tractable,  i.e.  to  illustrate  the 
feasibility  of  related  approaches.  In  respect  to  the  scope  of  the  paper,  simulation  technique  can  contribute 
to:  answering  at  least  the  first  part  of  the  questions  enumerated  above,  elaborating  new  algorithms, 
decreasing  the  risk  in  investments,  developing  the  (virtual)  information  system  of  the  HMS  by  using  the 
virtual  manufacturing  (VM)  concept  introduced  in  [16]. 

The  questions  enumerated  above  can  be  answered  by  extensive  simulation  only.  The  object-oriented 
framework  for  the  development  and  evaluation  of  distributed  manufacturing  architectures  described  in  [1 1  ] 
provides  a  root  model  that  represents  a  plant  and  can  contain  different  agents.  The  object  library 
incorporates  two  main  agent  types:  resource  agent  and  order  agent.  A  plant  in  the  model  will  contain  only 
one  order  agent  which  is  responsible  for  order  processing,  job  announcements  and  job  dispatching  between 
different  resources  or  groups  (Figure  1).  A  model  may  incorporate  several  resource  objects  which  can  be 
initialized  during  construction  (giving  the  name  of  the  resource,  process-capabilities  of  the  resource,  etc.). 


Fig.  1.  Structure  of  a  resource  agent  [11]. 


Agent  objects  contain  different,  functionally  separated  subagents.  Each  agent  in  the  framework  incorporates 
a  communication  subagent,  through  which  it  can  send  and  receive  messages,  by  using  a  modified  version  of 


133 


the  contract  net  protocol  [17].  Each  resource  agent  involves  a  resource  supervisor  subagent  which  controls 
its  actions.  All  agents  contain  a  registration  mechanism  through  which  they  register  and  unregister 
themselves.  Each  agent  has  local  knowledge  and  databases.  Information  about  machine  capability,  allocated 
time  intervals  for  different  jobs,  groups  in  which  the  agent  is  interested,  etc.  are  stored  there.  Any 
information  concerning  the  agent  itself  can  be  accessed  only  through  the  communication  subagent  by  a 
request  message.  Only  one  information  provider,  i.e.  the  registration  book,  is  treated  centrally  in  the  system. 

The  framework  is  intensively  used  for  research  purposes.  A  new  approach  to  agent-based  scheduling 
developed  and  tested  by  the  framework  is  described  in  [14]. 


HOLONIFICATION  OF  EXISTING  RESOURCES 

The  above  simulation  framework  is  appropriate  to  support  the  realization  of  holonic  manufacturing  systems 
in  green  field,  i.e.  from  resources  (e.g.  CNC  controlled  machines)  which  are  able  to  function  in  a  holonic 
way.  Unfortunately,  still  on  good  grounds,  firms  are  interested  in  using  -  at  least  partly  -  their  existing 
resources  or  resources  available  on  the  market.  There  are  some  severe  barriers,  however,  which  raise 
difficulties  in  this  respect: 

-  no  computer  numerical  controls  (CNCs)  which  satisfy  the  requirements  of  functioning  in  a  holonic 
way  are  on  the  market, 

-  the  available  CNCs  provide  communication  facilities  on  different  levels  and  their  autonomy  is  rather 
limited,  the  heterogeneity  of  the  system  makes  the  realization  of  the  communication-intensive 
holonic  approach  more  difficult. 

In  this  section  an  approach  to  the  holonification  of  existing  resources  (i.e.  converting  existing  resources  into 
holons)  is  suggested,  based  on  the  solutions  frequently  used  in  agent-based  software  engineering. 

Agent-based  software  engineering  was  invented  to  facilitate  software  that  is  inter-operable.  The  questions 
raised  by  software  engineers  are  similar  to  those  of  holonification  of  existing  manufacturing  resources:  what 
about  programs  that  have  already  been  written,  the  so-called  legacy  software ? 

Techniques  applicable  for  converting  a  legacy  software  into  software  agents  [18]  can  also  be  used  for 
holonification  as  follows: 

-  Implementation  of  transducers  which  mediate  between  an  existing  program  and  other  agents.  This  is 
especially  useful  for  situation  in  which  the  code  for  the  program  is  unavailable  or  too  delicate  to 
modify.  Transducers  can  be  implemented  by  the  developers  of  HMSs  or  even  by  independent 
organizations,  supposing  that  the  CNC  in  question  provides  two-directional  communication 
facilities. 

-  Wrapping,  i.e.  injecting  code  into  the  program  to  allow  it  to  communicate  in  ACL.  This  effective 
solution,  however,  requires  that  the  code  for  the  program  should  be  available.  Wrapping  is  a  feasible 
solution  not  only  for  CNC  manufacturers  but  also  for  users  having  CNCs  with  appropriate  openness. 

-  Rewrite  the  whole  program  which  is  only  applicable  in  the  case  of  whole  possession  of  the  original 
program.  Rewrite  can  be  applied  only  by  CNC  manufacturers  as  the  only  owners  of  the  source  code, 
but  usually  only  in  the  case  when  the  experts,  e.g.  the  former  programmers  are  available. 

It  is  important  to  note  that  manufacturing  agents  (holons)  can  include  not  only  information  processing,  but 
also  material  processing  activities. 


Holon 


Fig.  2.  Approaches  to  holonification  of  existing  resources 


134 


Holonification  of  existing  resources  requires  a  holonificator  (Figure  2)  which  realizes  all  the  functions 
needed  by  the  resource  to  be  able  to  act  as  a  holon  (communication,  autonomy).  Depending  on  the 
realization  of  the  coupling  between  the  legacy  system  (resource)  and  the  holonificator,  loosely  and  tightly 
coupled  holonification  approaches  can  be  distinguished  bearing  a  resemblance  to  the  transducer  and 
1 wrapper  concepts  of  software  agentification. 

Holonification  of  existing  resources  is  an  effective  technology  for  the  stepwise  realization  of  HMSs,  i.e.  in 
cases  when  the  systems’ resources  -  fully  or  partly  -  are  of  traditional  ones,  i.e.  without  the  autonomy  and 
cooperation  facilities  required  by  the  holonic  concept.  Figure  3  illustrates  a  heterogeneous  system  where 
teal’ and  holonificated  resource  holons  form  a  HMS.  (The  holonificated  resource  holons  can  be  substituted 
by  "real"  holonic  ones  in  a  stepwise  manner.) 


Fig.  3.  Heterogeneous  HMS  consisting  of  feal’ resource  holons  and 
resources  holonificated  by  different  approaches 


However,  holonification  of  existing  resources  and  development  of  the  whole  information  system,  constitute 
a  large  assignment,  and,  taking  the  open  questions  of  holonics  into  account,  seems  to  be  a  bit  premature. 


HOLONIFICATION  OF  TRADITIONAL  SYSTEMS  BY  USING 
THE  VM-CONCEPT 

In  this  Section  a  novel  approach  to  holonisation  of  whole  manufacturing  systems  is  introduced  based  on  an 
extension  of  the  Virtual  Manufacturing  (  VM)  concept.  Virtual  manufacturing  is  a  relatively  new  concept  of 
executing  manufacturing  processes  in  computers  as  well  as  in  the  real  world  [16].  Manufacturing  sub¬ 
systems  can  be  classified  into  four  categories:  Real  Physical  System  (RPS),  Real  Informational  System 
(RIS),  Virtual  Physical  System  (VPS),  Virtual  Informational  System  (VIS).  VM  enables  to  simulate 
manufacturing  processes  previously,  without  using  real  facilities,  and  by  this  way  to  accelerate  the  design 
and  re-design  of  real  manufacturing  systems  [19]. 

A  fundamental  feature  of  the  VM  concept  is  that  it  realizes  a  one-to-one  mapping  between  the  real  and 
virtual  systems,  i.e.  VIS  and  VPS  try  to  simulate  RIS  and  RPS,  respectively,  as  exactly  as  possible.  In  this 
section  an  extension  of  VM  concept  is  suggested  and  illustrated.  The  main  novelty  of  the  approach  is  the 
break  with  the  above  one-to-one  mapping,  more  exactly  the  use  of  the  VM  concept  to  control  a  traditional 
(centralized  /  hierarchical)  manufacturing  system  in  a  holonic  way. 

Suppose  there  is  a  central  control  unit  in  the  traditional  system,  the  fundamental  requirements  for 
holonisation  of  this  system  by  the  above  approach  are  as  follows.  The  ability  to  communicate  with  the 


135 


outside  world,  the  transfer  of  control  information  to  the  resources,  the  capture  of  state  information  and  the 
transfer  of  these  to  the  central  unit,  the  interruption  of  the  functioning  of  resources  at  given  periods,  the 
stopping  or  modifying  of  processes  started  previously. 


The  virtual  part  of  the  system  (Figure  4)  runs  in  a  holonic  way  and  incorporates  order  management, 
scheduling  and  control  issues.  To  realize  the  virtual  part,  approaches  such  as  the  framework  for  developing 
and  testing  agent-based  manufacturing  architectures  described  earlier  in  this  paper  can  be  advantageously 
used.  Resource  agents  which,  from  technological  point  of  view,  correspond  to  the  real  resources  of  the 
traditional  system  can  be  easily  constructed  by  using  the  object  library  of  the  framework  [11].  Order 
management  proceeds  fully  in  the  virtual  system. 


Decisions  are  taken  in  the  virtual,  holonic  system  and  conveyed  to  the  VIS  of  the  traditional  system.  The 
real  production  situation  is  sensed  by  the  RPS  and  forwarded  to  the  VIS,  which  initiates  appropriate 
measures  in  a  holonic  way.  In  summary,  the  traditional  system  shows  holonic  behavior.  Naturally, 
requirements  like  those  enumerated  here  are  important  prerequisites  for  the  approach.  The  holonic 
information  system  (VIS)  tested  in  a  virtual  environment  has  the  potential  of  to  be  used  in  a  real  holonic 
system. 


Fig.  4.  Concept  for  holonification  of  traditional  manufacturing  systems  by  using  the  VM  technology 


CONCLUSIONS  AND  FUTURE  ISSUES 

Distributed,  holonic-like  systems  represent  viable  alternatives  to  hierarchical  and  heterarchical  structures 
and  the  corresponding  reactive  scheduling  approaches. 

Both  of  these  -  apparently  distinct  -  lines  can  be  advantageously  used  in  rapidly  changing  production 
environments,  however  none  of  them  can  be  regarded  as  a  universal  tool.  Even  the  application  of  such 
powerful  global  optimization  techniques  as  GAs  [20,  21]  are  not  appropriate  for  very  large  problems,  and  as 
centralized  approaches,  are  not  totally  devoid  of  all  the  known  drawbacks  of  centralized  /  hierarchical 
control  systems.  The  multi-agent  approach  also  becomes  unrealistic  beyond  a  given  problem  size,  first  of  all 
due  to  the  rapidly  increasing  communication  burden.  Consequently,  both  directions  crave  for  some  kind  of 
aggregation  [8],  There  is  a  scope  for  temporal  or  permanent  hierarchies  also  in  holonic  architectures  [12]. 

A  fundamental  issue  in  distributed  systems  is  how  to  ensure  global  performance  with  selfish  agents  pursuing 
their  own  goals  [22],  The  integration  of  scheduling  agents  as  a  kind  of  staff  holons  [8]  into  the  architecture, 
can  contribute  to  achieving  global  coherence  in  distributed  systems.  This  enhancement  of  the  basic, 
heterarchically  dominant  structure  can  be  accomplished  in  several  ways: 

-  There  is  one  central  scheduling  agent  in  the  system,  which  in  normal,  static  conditions  can  ensure 
global  performance  if  other  agents  follow  its  command  or  accept  its  advice.  In  changing  conditions, 
however,  through  increased  autonomy  of  agents,  a  more  dynamic  behavior  can  be  reached. 


136 


-  The  opportunity  to  incorporate  a  scheduling  agent  into  any  aggregated  holon,  i.e.,  to  introduce  some 
hierarchy  in  temporal  or  static  coalitions,  is  expected  to  result  in  systems  with  a  more  balanced 
operation  and  to  resolve  the  questionable  scalability  of  the  approaches  investigated  in  the  paper.  In 
this  way,  the  amount  of  necessary  communication  can  be  decreased,  and  the  holonic  concept  can  be 
expanded  to  the  level  of  extended  or  virtual  enterprises. 

The  object-oriented  simulation  framework  described  in  the  paper,  is  expected  to  provide  the  necessary 
framework  for  further  research  addressing  such  fundamental  issues  of  distributed  manufacturing  as  agent 
structures,  communication  protocols,  group  formation,  negotiation,  global  versus  local  optima,  scalability, 
system  tuning,  agents’behavior,  learning  at  system  and  agent  level  [23]. 

ACKNOWLEDGEMENT 

This  work  was  partially  supported  by  the  National  Research  Foundation,  Hungary,  Grant  Nos.  T026486 
and  F023628.  A  part  of  the  work  was  covered  by  the  National  Committee  for  Technological  Development, 
Hungary  Grants  (EU-96-B4-025  and  EU-97-A3-099)  promoting  Hungarian  research  activity  related  to  the 
ESPRIT LTR  Working  Groups  (IiMB  21108  and  IMS  21995). 

REFERENCES 

1.  HJ.  Wamecke,  1993.  The  Fractal  Company,  a  revolution  in  corporate  culture,  Springer-Verlag. 

2.  N.  Okino,  1993.  Bionic  manufacturing  systems,  in  CIRP,  Flexible  Manufacturing  Systems  Past- 
Present-Future,  Peklenik,  J.  (ed),  73-95. 

3.  K.  Ueda,  1993.  A  genetic  approach  toward  future  manufacturing  systems.  In  CIRP,  Flexible 
Manufacturing  Systems  Past-Present-Future,  Peklenik,  J.  (ed),  21 1-228. 

4.  K.  Iwata,  M.  Onosato,  M.  Koike,  1 994.  Random  manufacturing  system:  a  new  concept  of 
manufacturing  systems  for  production  to  order.  CIRP  Annals,  43(1),  379-383. 

5.  P.  Valckenaers,  F.  Bonneville,  H.  Van  Brussel,  L.  Bongaerts,  J.  Wyns,  1994.  Results  of  the  holonic 
system  benchmark  at  KULeuven,  Proc.  of  the  Fourth  Int.  Conf.  on  Computer  Integrated  Manufacturing 
and  Automation  Technology,  Oct.  10-12,  Troy,  New  York,  128-133. 

6.  A.H.  Bond,  L.  Gasser,  (Eds.),  1988.  Readings  in  DAI,  Morgan-Kauffnann. 

7.  K.  Koussis,  H.  Pierreval,  N.  Mebarki,  1997.  Using  multi-agent  architecture  in  FMS  for  dynamic 
scheduling.  Journal  of  Intelligent  Manufacturing,  16(8),  41-47. 

8.  H.  Van  Brussel,  J.  Wyns,  P.  Valckenaers,  L.  Bongaerts,  P.  Peeters,  1998.  Reference  architecture  for 
holonic  manufacturing  systems,  Computers  in  Industry,  Special  Issue  on  IMS,  37(3),  255-276. 

9.  H.-P.  Wiendahl,  V.  Ahrens,  1997.  Agent-based  control  of  self-organized  production  systems,  CIRP 
Annals,  46(1)  365-368. 

10.  M.M.  Tseng,  M.  Lei,C.  Su,  1997.  A  collaborative  control  system  for  mass  customization 
manufacturing,  CIRP  Annals,  46(1),  373-376. 

11.  B.  Kadar,  L.  Monostori,.  E.  Szelke,  1997.  An  object  oriented  framework  for  developing  distributed 
manufacturing  architectures,  Proc.  of  the  Second  World  Congress  on  Intelligent  Manufacturing 
Processes  and  Systems,  Budapest,  Hungary,  548-554.,  and  in  J.  Intel.  Manu.,  9(2),  1998, 173-179. 

12.  L.  Bongaerts,  L.  Monostori,  D.  McFarlane,  B.  Kadar,  1998.  Hierarchy  in  distributed  shop  floor  control 
Proc.  of  1st  Inter.  Workshop  on  Intel.  Manu.  Sys.,IMS  Europe  1988,  Lausanne,  Switzerland,  97  -  1 13. 

13.  H.  Van  Brussel,  P.  Valckenaers,  J.  Wyns,  L.  Bongaerts,  J.  Detand,  1996.  Holonic  manufacturing 
systems  and  IiM.  in:  IT  and  Manufacturing  Partnerships,  Conf.  on  Integration  in  Manu.,  Galway, 
Ireland,  185-196. 

14.  L.  Monostori,  B.  Kadar,  J.  Homyak,  1998.  Approaches  to  manage  changes  and  uncertainties  in 
manufacturing,  CIRP  Annals,  47(1),  365-368. 

15.  B.  Kadar,  L.  Monostori,  1 998.  Agent  based  control  of  novel  and  traditional  production  systems,  Proc. 
ICME98,  CIRP  Inter.  Sem.  on  Intel.  Comp,  in  Manu.  Eng.,  Capri,  Italy,  31-38.  (key-note  paper) 

16.  M.  Onosato,  K.  Iwata,  1993.  Development  of  a  virtual  manufacturing  system  by  integrating  product 
models  and  factory  models,  CIRP  Annals,  42(1),  475-478. 

17.  R.G.  Smith,.  1980.  The  Contract  Net  Protocol:  High-level  communication  and  control  in  a  distributed 
problem  solver,  IEEE  Trans,  on  Computers,  Vol.  C-29,  12,  Dec.,  1104-1 1 13. 

18.  M.R.  Genesereth,  S.P.  Ketchpel,  1994.  Software  agents,  Communications  of  the  ACM,  37(7),  48-53. 


137 


19.  K.  Iwata,  M.  Onosato,  M.K.  Teramoto,  S.  Osaki,  1995.  A  modeling  and  simulation  architecture  for 
virtual  manufacturing  systems,  CIRP  Annals,  44(1),  399-402. 

20.  H.P.  Wiendahl,  R.  Garlihcs,  1994.  Decentral  production  scheduling  of  assembly  systems  with  genetic 
algorithm,  CIRP  Annals,  43(1),  389-395. 

21.  J.  Homyak,  L.  Monostori,  1 998.  Genetic  algorithms  for  predictive  and  reactive  scheduling  of  manufacturing 
systems,  Proc.  1CME98,  CIRP  Int.  Sem.  on  Intel.  Comp,  in  Manu.  Eng.,  Capri,  Italy,  213-220. 

22.  J.  Hatvany,  1985.  Intelligence  and  cooperation  in  heterarchic  manufacturing  systems,  Robotics  &  CIM, 
2(2),  101-104. 

23.  L.  Monostori,  A.  Markus,  H.  Van  Brussel,  E.  Westkamper,  1996.  Machine  learning  approaches  to 
manufacturing,  CIRP  Annals,  45(2),  675-712. 


138 


139 


Intelligent  Database  Support  for  Manufacturing  and 
Processing  of  Industrial  Materials 

S.A.  Ehikioya,  E.G.  Truelove,  and  T.T.  Tran 

Department  of  Mathematics  and  Computer  Science, 

Brandon  University, 

Brandon,  MB,  Canada,  R7A  6A9 


ABSTRACT 

This  paper  describes  the  framework  and  characteristics  of  an  Intelligent  Database  System  that  supports  and 
satisfies  business  needs  and  requirements  of  industrial  materials  manufacturing  and  processing.  We  also 
examine  the  methods  to  design  and  build  such  systems  and  discuss  the  issues  and  technologies  that  require 
attention.  We  focus  on  intelligent  database  systems  support  for  a  reasoned  choice  among  alternatives,  and 
the  manner  in  which  their  capabilities  can  be  extended  to  create  an  effective  decision  support  tool. 


INTRODUCTION 

Businesses  collect  and  use  data  regularly,  to  help  make  important  decisions.  Almost  all  decisions  rely  on 
data  in  some  way  and  the  quality  of  data  affects  the  quality  of  decisions,  sometimes  drastically  [1].  Today 
businesses  are  in  a  global  competition  where  success  does  not  only  depend  on  a  long-term  strategy,  but  also 
on  how  well  a  business  supports  decision-making  that  is  unpredictable  in  advance.  With  development  of 
the  computer,  data  storage  and  processing  have  become  much  more  efficient.  An  organisation’s  decision¬ 
makers  must  decide  how  to  use  and  interpret  data.  With  a  large  volume  of  data  to  be  interpreted,  it  can  be 
difficult  to  extract  the  specific  data  needed  for  each  decision.  As  technology  advanced,  Management 
Information  Systems  emerged  to  provide  periodic  reports  containing  static,  descriptive  knowledge.  Since 
these  reports  were  only  produced  periodically,  they  had  limited  use  to  support  dynamic  business  decisions. 
This  has  necessitated  Decision  Support  Systems  (DSS)  to  help  management  with  critical  decision-making. 
A  DSS  is  a  "computer-based  information  system  that  provides  the  user  with  decision-oriented  information 
whenever  decision-making  situations  arise"  [2],  In  other  words,  a  DSS  uses  computer  technologies  to 
process  and  refine  data,  and  present  the  correct  information  to  the  decision-maker  to  assist  in  solving 
unstructured  problems.  A  DSS  aids  the  decision-maker  to  make  informed  decisions  by  allowing  for  better 
decisions,  increasing  the  efficiency,  productivity,  and  effectiveness  of  decisions,  helping  manage  stored 
data,  and  supplementing  the  decision-maker's  abilities.  A  DSS  broadens  the  scope  of  decisions,  particularly 
semi-structured  and  unstructured  decisions,  and  so,  increases  business  productivity. 

The  shift  in  focus  from  transaction  processing  to  knowledge  processing  in  work  done  by  business  managers 
requires  an  intelligent  DSS  to  process  knowledge  for  sophisticated  managerial  decision-making.  The 
human  decision-maker  still  makes  the  decisions,  so  these  systems  are  really  support-systems.  DSS 
designers  must  understand  human  decision-making  and  the  domain  of  the  application.  Decision-making  is 
knowledge  intensive.  An  intelligent  DSS  can  combine  and  synthesize  solutions  of  sub-problems  into  the 
solution  of  a  larger  problem  thereby  enhancing  the  process.  Since  a  DSS  does  not  typically  make  decisions, 
a  successful  DSS  must  provide  sufficient,  accurate  information  in  a  useful,  efficient  way.  To  achieve  this, 
the  DSS  must  be  built  on  a  robust  data  delivery  architecture  that  addresses  all  end-user  requirements. 

Businesses  and  organizations  now  recognize  the  value  of  DSS,  substantially  increasing  demand  for 
efficient,  fully  functional,  well-designed  DSSs.  Many  systems  lack  an  Intelligent  Database  System  (IDBS) 
architecture  that  integrates  the  data  warehouse,  metadata,  and  a  graphical  user  interface  to  give  advanced 
DSS  features.  Without  an  architectural  blueprint,  it  is  not  possible  to  define  and  develop  fundamental  DSS 
components,  which  combine  to  provide  a  full  suite  of  DSS  capabilities  such  as  multi-dimensional  data 
viewing,  data  mining,  fast  query  processing,  and  elegant  reporting.  Data  mining  is  "the  extraction  of  hidden 
predictive  information  from  large  databases"  [3],  Data  mining  extracts  information  quickly  and  efficiently 
and  uses  existing  information  to  model  unknown  situations.  See  [3]  for  an  array  of  data  mining  techniques. 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


140 


In  this  paper,  we  focus  our  attention  on  development  issues  of  an  IDBS  using  Relational  On-line  Analytical 
Processing  (Relational  OLAP)  technology.  We  also  discuss  how  such  a  system  ensures  success  of  its  DSS 
counterpart.  In  particular,  we  detail  the  end-user  requirements,  the  main  features  that  an  IDBS  must  offer  to 
address  these  requirements,  the  design  issues,  as  well  as  techniques  and  tools  used  to  obtain  these  features. 
The  described  architecture  consists  of  components,  each  with  a  distinct  purpose  in  the  delivery  of  data.  This 
architecture:  (i)  provides  seamless  access  to  existing,  normalized  data  models  and  a  variety  of  performance- 
optimized  data  models  including  data  warehouses,  and  (ii)  enables  an  organisation  to  use  information  as  a 
strategic  asset  by  applying  their  databases  to  support  decision-making.  We  utilize  workflow  design 
methodology  and  fuzzy  logic/set  theory  to  formalize  the  correctness  of  our  underlying  IDBS  for  the  DSS. 

Relational  OLAP,  a  flexible  and  general  architecture  that  meets  a  wide  variety  of  DSS  and  OLAP  needs, 
allows  multidimensional  analysis  of  relational  data.  It  is  our  preferred  architecture.  See  [4]  for  details  of  our 
choice  over  multidimensional  OLAP  architectures.  This  framework  directly  provides  the  functionality  to 
support  an  intelligent  DSS.  Decisions  in  manufacturing  and  processing  of  industrial  materials  are  akin  to 
those  in  most  manufacturing  companies.  They  can  be  very  complex  and  can  include  scheduling  of  plants, 
downsizing  operations,  maintaining  market  share,  supplier  information,  inventory  analyses,  sales  analyses, 
and  customer  relations.  The  framework  is  applicable  across  many  organisational  settings. 


CHARACTERISTICS  AND  FRAMEWORK 

The  characteristics  and  framework  of  a  DSS  as  well  as  the  nature  of  decisions  must  be  understood  before 
beginning  system  design.  Decision  problems  can  be  very  complex  since  the  business  world  is  dynamic.  So 
quite  often,  there  are  situations  for  which  the  information  needed  is  not  ideal,  i.e.,  they  may  be  either  semi- 
structured,  or  unstructured.  In  a  semi-structured  situation,  there  are  some  unknowns,  or  some  areas  where 
knowledge  is  incomplete  or  unavailable.  The  task  may  not  be  completely  understood.  In  an  unstructured 
situation,  there  is  little  relevant  knowledge  available,  no  experience  with  the  task,  and  many  unknown,  un- 
reproducible  cases  leading  to  a  highly  uncertain  environment.  Problem-structuring  is  important  to  the 
overall  system  performance.  The  issues  and  implications  of  the  problem  must  be  identified,  in  the  belief 
that  better  understanding  will  lead  to  better  support  from  a  computerized  system.  An  intelligent  DSS  must 
support  many  cases  of  unstructured  decisions.  The  DSS  must  provide  alternatives,  or  when  given  a  set  of 
alternatives,  be  able  to  decide  upon  the  best  possible  decision  in  a  timely  and  efficient  manner. 

Features  common  to  most  DSSs  are:  (a)  a  large  knowledge  base  which  is  both  historical  and  immediate  in 
nature;  (b)  the  ability  to  respond  to  many  different  questions,  e.g.,  what-if  scenarios  and  queries;  (c)  the 
ability  to  interact  with  users  and  provide  flexible  decision-making  choices;  (d)  the  ability  to  acquire  new 
information,  which  may  be  descriptive  in  nature,  and  (e)  the  ability  to  maintain  data  and  other  kinds  of 
knowledge.  In  addition,  other  key  features  include:  (a)  a  multidimensional  conceptual  view  of  data;  (b)  an 
ability  to  create  complex  criteria  sets  that  establish  both  pattern  and  dimensions  in  the  data,  (c)  support  for 
hierarchical  consolidation  of  data,  and  the  ability  to  drill  down  into  detail  to  mine  relevant  sub-facts;  (d) 
support  for  rapid  responses  and  elegant  reporting  systems;  (e)  seamless  integration  of  the  system  with  other 
desktop  productivity  tools;  (f)  support  for  interpretation  of  query  results  and  facilities  to  automate 
transaction  initiations,  e.g.,  change  product  prices;  and  (g)  automation  of  common  tasks,  e.g.,  requests  for 
detailed  information  about  a  decision-support  inquiry.  The  DSS  allows  managers  to  make  strategic  and 
tactical  decisions  using  corporate  data.  The  system  allows  complex  analysis  of  large  input  data  sets  from 
different  perspectives.  Managers  can  manipulate  and  analyse  data  intuitively,  quickly,  and  flexibly. 

To  fulfill  the  above  user  requirements,  the  underlying  IDBS  must  provide  the  following  key  features: 

•  A  Criteria  Builder  Manager  to  allow  a  user  to  specify,  save,  and  reuse  filters  to  retrieve  data  from  the 
data  warehouse.  This  item  enables  a  user  to  specify  constraints,  data  source  domains,  specific  needs  in 
relation  to  the  current  decision,  and  various  confidence  measures  and  similarity  functions  used  to 
assess  the  recommendations.  The  degree  of  inclusion  is  also  specified  here  by  constructing  a  set  of 
values  and  conditions  to  create  dynamic  SQL  statements  used  to  retrieve  relevant  data.  This  item 
provides  configurable  functionality,  i.e.,  the  ability  to  customize  data  sources  [5]. 

•  The  Presentation  View  Manager  that  offers  facilities  to  customize  the  output  templates,  by  specifying 
the  format  e.g.,  tabular,  graphical,  in  which  to  present  query  results. 


141 


•  The  Data  Rotation  Manager  that  allows  the  specification  of  how  query  results  are  presented  on  x-y 
axes.  This  provides  an  advanced  transform  module  to  create  integrated  views  of  important  information. 

•  The  Intelligent  Agent  Manager  that  enables  the  creation  of  customized  data  analyses,  the  mining  of 
data  using  predefined  criteria  sets  and  data  extraction  techniques,  the  differentiation  of  retrieved  data 
into  various  degrees  of  relevance,  and  the  reporting  of  the  data  to  the  user. 

•  The  Knowledge  System  is  the  data  that  a  business  has  stored,  quite  often  in  the  form  of  large  databases 
(both  operational  data  and  data  warehouses)  on  stable  storage  media  such  as  a  hard  disk.  Relevant 
operational  data  is  extracted  from  the  operational  database,  cleansed  and  summarized.  The  resulting 
data  is  formatted  for  decision-support  before  uploading  it  into  the  data  warehouse. 

•  The  Problem-Processing  System,  the  engine  of  the  DSS,  processes  information  and  recognizes  and 
solves  problems.  It  consists  of  modules  for  data  mining,  OLAP,  and  fast,  dynamic  query  processing. 
Data  mining  provides  trends,  influence  factors  and  relationships  in  data,  e.g.,  what  impacts  sale  of  rod 
in  Toronto.  OLAP  provides  multidimensional  processing  of  data,  e.g.,  sales  performance  for  rod  by 
state  by  month  and  by  brand.  Query  processing  allows  a  user  to  ask  questions.  The  Problem-Processing 
System  can  use  the  contents  of  the  Knowledge  System  to  acquire  information  to  solve  problems.  The 
system  uses  fast-query  processing  to  access  data.  The  activity  is  RAM-based  to  enhance  performance. 

•  The  Metadata  System  that  provides  logical  linkage  between  data  and  applications;  identifies  the 
contents  and  location  of  data  and  includes  definitions  from  a  decision-support  perspective.  Since 
Metadata  provides  decision-oriented  pointers  to  enterprise  [warehouse]  data,  it  acts  as  a  bridge 
between  enterprise  data  and  the  decision-support  application..  When  combined  with  the  DSS  engine, 
this  module  enables  pinpoint  access  to  information  across  the  entire  data  warehouse. 

The  data  cleansing  activity  of  the  underlying  intelligent  database,  which  is  the  process  of  ensuring  all 
values  in  a  data  set  are  consistent  and  correctly  recorded,  should  be  robust  and  efficient.  That  clearly 
addresses  the  Metadata  preparation  and  storage  needs,  thereby  creating  a  clear,  clean  integrated  view  of  the 
data  distributed  effectively  for  decision-making.  This  increases/improves  the  accuracy  of  data  and  provides 
the  most  cost-effective  way  to  retrieve  data.  During  analysis,  the  retrieved  metadata  arc  fuzzified  to  provide 
additional  information  about  the  strength  of  various  attributes  to  resolve  any  expressive  deficiencies  of  the 
original  Metadata  sets.  We  associate  a  fuzzy  measure  with  each  attribute  to  provide  measurement  of  its 
relevance  to  the  decision.  This  is  similar  to  assigning  a  probability  value  ranging  from  0  to  1 .  Fuzziness 
allows  greater  flexibility  in  classifying  decision  variables. 

To  be  intelligent,  a  database  system  must  efficiently  process  user  queries,  and  allow  a  wide  variety  of 
options  for  querying.  The  ability  to  extract  the  most  relevant  attributes  to  discriminate  among  decisions  is 
key  to  a  successful  IDBS  to  improve  decision  support.  Various  techniques  are  used  to  optimize  the  retrieval 
of  relevant  data.  These  techniques  will  be  discussed  in  a  later  section.  Figure  1  illustrates  the  architecture  of 
our  IDBS.  The  criteria  builder,  presentation  view,  and  data  rotation  managers  are  the  underlying  layers  of 
the  user  interface  which  consists  of  the  language  and  report  systems. 


Fig.  1.  The  Generic  IDBS  Framework 


142 


Many  techniques  can  be  used  to  support  DSS:  mathematical  modeling,  such  as,  discriminant  analysis, 
rough  and  fuzzy  sets  analysis,  and  artificial  neural  networks  [6],  These  techniques  enhance  a  DSS’s 
capabilities,  so  it  can  apply  a  greater  degree  of  intelligence  to  support  a  manager’s  decisions.  The 
methodologies  and  tools  needed  to  build  a  DSS  are  discussed  in  the  following  section.  Note  that  intelligent 
database  support  for  quality  control  is  essential  in  all  database  applications,  to  ensure  the  integrity  of  the 
data  on  which  critical  decisions  are  based. 


METHODOLOGIES 

Design  of  a  DSS  can  be  done  in  several  ways;  some  are  formal  approaches  that  apply  a  specific 
methodology  while  other  are  informal  approaches  that  do  not  require  strict  adherence  to  specific  steps.  We 
highly  recommend  the  formal  approaches  because  of  the  benefits  (see  [7]  for  details)  associated  with  their 
use.  A  system  development  life  cycle  (SDLC)  [2]  is  a  common  formal  approach,  which  comes  in  many 
flavours,  but  usually  incorporates  the  steps  shown  in  Figure  2. 


Fig.  2.  The  Systems  Development  Life  Cycle 

Various  methodologies  follow  the  above  approach  but  only  "structured  analysis"  will  be  discussed  along 
with  its  applicability  to  design  of  a  DSS.  Structured  analysis  is  primarily  based  on  the  representation  of 
systems-essential  processes  with  a  variety  of  models,  including  data  models,  logical  data  flow  diagrams, 
and  physical  data  flow  diagrams.  One  of  the  first  steps  in  any  analysis  is  to  determine  the  system 
requirements.  For  a  DSS,  this  involves  several  things. 

•  What  are  the  kinds  of  data  for  which  the  knowledge  system  will  be  responsible?  It  is  not  necessary  to 
plan  the  database  itself  at  this  point,  but  a  general  idea,  of  the  data  to  be  stored,  is  an  asset.  Data 
models  can  be  drawn  to  show  the  types  of  data  and  the  relationships  between  them. 

•  Who  will  be  using  the  DSS?  It  is  very  important  to  have  end-user  participation  to  successfully  model 
the  language  and  presentation  systems  so  that  the  DSS  is  easy  to  understand  andaccessible.  Interface 
models  can  be  drawn  to  represent  communication  with  the  user. 

•  What  type  of  decisions  will  the  system  be  required  to  support?  When  analyzing  the  problem-processing 
system,  relevant  problem  recognition  and  solving  means,  the  knowledge  used  in  decision-making,  the 
organization  context  in  which  the  system  will  be  used,  and  the  behavior  and  inclination  of  the  decision¬ 
makers  should  be  taken  into  consideration.  [8] 

•  What  processes  are  required  and  how  do  they  relate  to  each  other?  This  is  where  the  data  flow 
diagrams  are  designed.  The  processes  needed  for  the  functioning  of  the  DSS  must  be  identified  and  a 
logical  data-flow  diagram  can  be  drawn  so  show  what  the  system  should  do. 

To  design  the  language  system  and  report  system,  it  is  necessary  to  model  the  user  interface.  The  problem¬ 
processing  system  uses  data-flow  diagrams  with  process  control  and  workflow  diagrams.  Design  of  the 
knowledge  system  includes  modeling  of  the  database  structure  itself,  and  deciding  on  the  best  format  to 
store  the  data.  Implementation  involves  putting  the  knowledge  system  on  some  type  of  storage  medium, 
and  organizing  the  other  pieces  of  the  DSS  to  create  an  operational  system.  Extensive  tests  must  be 
performed  on  the  new  system  to  ensure  that  it  runs  to  satisfaction. 


143 


The  formal  approach  prescribed  by  structured  analysis  ensures  that  all  proper  steps  are  followed  so  that  all 
requirements  are  discovered  and  modeled  correctly.  A  structured  analysis  also  gives  the  end-users  of  the 
system  a  great  deal  of  input  into  how  the  system  will  behave. 


TOOLS 

Information  System  developers  use  a  variety  of  tools  to  create  DSSs,  including  3rd  and  4th  generation 
language  application  development  tools.  Although  these  tools  provide  valuable  development  functionality, 
they  are  often  used  outside  the  context  of  an  integrated,  decision-support  architecture.  This  leads  to 
development  of  unreliable  and  brittle  applications  that  cannot  guarantee  the  desired  functionality.  The 
choice  of  tool  has  great  impact  on  the  final  system.  These  tools  can  aid  in  identifying  analysis  requirements 
or  producing  design  specifications  [8].  There  are  many  types  of  tools  which  can  be  used  in  many  different 
combinations.  Different  tools  can  be  used  for  certain  aspects,  interface  design,  graphics  design,  etc.  It  is 
possible  to  delay  the  decision  on  which  tool  to  use  until  after  initial  design  so  that  design  is  not  constrained 
by  using  a  particular  tool.  However,  the  system  may  be  difficult  to  implement. 

A  tool  such  as  System  Architect  ™  can  aid  greatly  in  the  analysis  phase  of  structured  analysis  because  it 
provides  the  ability  to  create  the  necessary  models.  Models  can  still  be  “hand  drawn”  but  CASE-based 
programs  like  System  Architect  allow  greater  flexibility  and  their  ease  of  use  make  them  a  good  choice. 
Microsoft  ®  Project  is  a  good  tool  for  building  Gantt  and  PERT  charts  that  help  plan  and  monitor  the 
progression  of  the  phases  of  the  system  development.  Various  database  tools,  such  as  Oracle,  Microsoft  ® 
ACCESS,  SQL  Server,  and  PowerBuilder,  can  facilitate  building  the  knowledge  system  in  the  form  of  a 
relational  database.  The  relational  database  model  is  an  excellent  choice  for  storing  data  because  of  its  fast 
data  access,  support  for  multiple-user  views,  and  its  built-in  querying  language. 

The  user  interface  may  be  the  most  important  aspect  as  it  is  the  user’s  point  of  contact  with  the  DSS.  Some 
issues  in  designing  the  user  interface  include  whether  to  use  a  natural,  non-procedural  language  to  make  the 
interface  as  user-friendly  as  possible;  and  how  to  present  the  results  of  a  query.  Microsoft  ®  ACCESS, 
Visual  Basic,  Delphi,  and  Visual  C++,  Oracle  Forms,  all  allow  design  of  forms  that  can  be  tied  to  database 
tables.  These  forms  display  information  in  an  understandable  format.  Other  options  are  a  menu-based 
interface  where  a  user  selects  commands,  or  a  question  and  answer-based  interface  where  a  user  asks  one 
question  at  a  time.  A  rising  technology  is  voice  recognition  software.  There  are  other  classes  of  tools  that 
include  specialized  data  mining,  OLAP,  AI,  and  web-based  DSS  tools.  See  [3]  for  a  comprehensive  listing. 


Fig.  3.  Data  Mining  in  the  DSS  Context  [9] 

Figure  3  shows  the  relationship  between  data  mining  and  other  techniques  as  applied  to  the  historical  data 
warehouse.  Data  mining  helps  a  DSS  to  be  more  intelligent  as  it  can  recognize  patterns  of  data  that 
managers  never  thought  to  look  for.  This  data  can  help  predict  what  might  happen  in  the  future,  allowing 
for  more  informed  decisions. 


144 


OLAP  "enables  better  decision-making  by  giving  business-users  quick,  unlimited  views  of  multiple 
relationships  in  large  quantities  of  summarized  data."  [10].  OLAP  helps  DSSs  to  be  more  intelligent  since  a 
greater  depth  of  query  is  allowed.  The  results  of  these  queries  can  be  used  to  forecast  trends  in  business  and 
can  support  management  decisions. 


SOME  DESIGN  ISSUES 

Some  design  issues  deserve  attention.  Creation  of  a  robust  data  delivery  architecture  that  seamlessly 
supports  DSS  is  difficult  due  to  problems  associated  with  metadata  definitions.  In  an  1DBS,  metadata  must 
map  a  users’  familiar  entities,  e.g.,  dimensions  and  attributes  to  tables  and  columns  within  the  database.  So, 
design  specification  must  include:  (a)  dimension,  attribute  and  metric  definitions;  (b)  attribute  hierarchies 
that  indicate  parent-child  relationships;  (c)  metric-attribute  mapping,  including  physical  mapping. 

The  user  interface  is  an  important  component  of  the  IDBS.  Since  the  Criteria  Builder  allows  a  user  to 
specify  subsets  of  data  retrieved  from  the  data  warehouse  by  constructing  condition  clauses  of  dynamic 
SQL  SELECT  statements.  It  provides  facilities  that  guarantee  reliability  and  correctness.  The  interface 
should  be  friendly  to  use  and  able  to  create  complex  component  sets  while  abstracting  away  the 
mathematics  underlying  SQL  statements.  The  DSS  engine  should  provide  multiple  analyses  from  a  single 
data  set  by  varying  presentation  and  rotation  options. 

Although  we  adopt  the  framework  discussed  in  this  paper  to  build  an  IDBS  prototype  with  advanced 
features  for  improved  DSS,  some  problems  still  persist.  Data  uncertaintity  is  a  prevalent  occurrence  in  most 
organisations.  We  cannot  vouch  in  some  cases  about  the  quality  of  the  data  upon  which  decisions  are  based. 
So  we  need  a  robust  data-quality  module  to  improve  the  data  before  they  are  used  in  the  system.  The  best 
idea  is  to  incorporate  quality  criteria  in  the  acceptance  of  operational  data  into  the  databases. 


CONCLUSION 

There  have  been  great  strides  made  in  the  area  of  DSS.  The  use  of  data  mining,  OLAP,  AI,  and  web-based 
tools  and  techniques  have  increased  the  degree  of  intelligence  in  supporting  the  complex  decisions  of 
management.  Despite  all  the  advances,  one  thing  is  certain:  there  is  no  way  to  predict  the  future,  but  a  well- 
designed  DSS  can  help  management  make  decisions  for  the  growth  and  improvement  of  the  company. 
While  some  problems  persist,  the  IDBS  discussed  here  can  play  an  important  role  in  satisfying  business 
requirements  of  various  industries,  e.g.,  materials  manufacturing  and  processing.  Our  design  is  practical 
and  scalable,  and  applicable  to  environments  that  require  semi-structured  or  unstructured  decisions. 


REFERENCES 

1.  Parsaye  K.  and  Chignell  M.H.,  1994.  Quality  Unbound,  Database  Programming  and  Design, 
November,  (also  at:  http://www.datamining.com/datamine/qualitv.htm  ) 

2.  Whitten,  J.L.  Bentley,  L.D.,  1998.  Systems  Analysis  and  Design  Methods 4th  ed.,  McGraw-Hill, 

3.  Pilot  Software,  1998.  Introduction  to  Data  Mining:  Discover  Hidden  Value  in  Your  Data  Warehouse, 
Pilot  Software,  Inc.,  Cambridge,  MA.,  (also  at:  http://www.pilotsw.com/dmpaoer/dmindex.htm  ) 

4.  MicroStrategy  Inc.,  1995.  The  Case  for  Relational  OLAP,  Data  Warehouse  DSSs,  MicroStrategy  Inc. 
(also  at:  http://www.strategy.com/DW  Forum/WhitePapers/Case4Rolap ) 

5.  Ehikioya  S.A,  1999.  "A  Formal  Model  for  the  Reuse  of  Software  Specifications".  1999  IEEE 
Canadian  Conf.  on  Electrical  and  Computer  Engineering,  Edmonton,  Alberta,  Canada,  May  9-12. 

6.  Michalowski,  D.  “Intelligent  Decision  Support:  Knowledge  Discovery  and  Applications”. 

(see:  http://www.iiasa.ac.at/research/DAS/res98/nodelO.htmll 

7.  Ehikioya  S.A.,  1997.  Specification  of  Transaction  Systems  Protocols.  Ph.D  Thesis,  Dept,  of  Computer 
Science,  University  of  Manitoba,  Winnipeg,  Manitoba,  Canada. 

8.  Holsapple  C.W.,  Whinston  A.B.,  1996.  Decision  Support  Systems:  A  Knowledge-Based  Approach, 
West  Publishing  Company. 

9.  Information  Discovery  Inc.,  1997.  Perspective  on  Data  Mining:  Reaping  Benefits  from  Your  Data,  A 
White  Paper,  Information  Discovery  Inc.,  (also  at  http://www.datamining.com/datamine/dm-ka.htm') 

10.  Business  Intelligence  /  OLAP,  SAS,  (also  at:  http://www.sas.com/software/olap') 


145 


Intelligent  Production  Management  in  Mining  Systems 

Sean  Dessureault,  Malcolm  Scoble,  Scott  Dunbar 

Department  of  Mining  and  Mineral  Process  Engineering, 
University  of  British  Columbia, 

Vancouver,  British  Columbia,  Canada 


ABSTRACT 

This  paper  attempts  to  characterize  the  mining  process  and  the  principles  of  mine  production  management. 
It  considers  the  integration  of  decision  support  and  information  technologies  into  new  intelligent  systems 
for  mine  production  management.  A  case  study  is  examined  which  considers  the  production  planning  and 
control  procedures  at  an  open  pit  copper  mine.  This  is  used  to  demonstrate  an  approach  to  short  term 
planning  based  on  optimizing  profit,  using  spreadsheet  software  and  linear  programming.  This  work  shows 
that  with  little  cost  and  time,  a  spreadsheet  tool  can  be  developed  in-house  for  more  scientific  and  cost 
effective  mine  production  management.  Such  tools  enable  more  constraints  to  be  considered  and 
management  performance  to  be  evaluated. 


INTRODUCTION 

Decision  support  tools  have  evolved  in  scope,  efficacy  and  availability  faster  than  the  mining  industry^ 
ability  to  absorb  them.  Empirical  criteria,  past  procedure  and  intuition  still  represent  the  foundations  for 
mine  production  planning  and  control.  Although  most  mines  have  access  to  linear  programming  and 
simulation  software,  few  use  such  tools  to  improve  production  management.  This  paper  will  demonstrate, 
through  a  case  study,  how  software  can  be  used  to  support  the  formulation  of  mining  plans  and  production 
control  in  a  more  effective  and  intelligent  way.  The  case  study  is  based  on  a  large  open-pit  copper  mine  in 
British  Columbia,  operating  trucks  and  shovels  as  its  primary  materials  handling  system.  A  ten-year-old 
truck  management  (dispatch)  system  is  used  to  optimize  the  productivity  of  the  trucks.  The  dispatch 
system  is  linked  via  radio  to  the  trucks  whose  position  is  determined  using  passive  transponders  at  key 
locations  along  the  haul  routes.  The  mine  is  looking  to  implement  global  positioning  systems  for  its  mobile 
equipment  within  the  mine. 


MINING  SYSTEMS 

Surface  mines  are  typically  very  large  in  scale  and  low  in  grade.  They  represent  a  complex  materials 
handling  system  which  is  challenging  in  terms  of  production  management.  The  first  component  process  in 
the  system  consists  of  drilling  and  blasting  in  order  to  fragment  the  ore  and  waste  for  subsequent  handling. 
Then  the  primary  loading  equipment,  either  cable  or  hydraulic  shovels,  load  the  material  onto  hauling 
equipment,  which  are  usually  veiy  large  haul  trucks.  The  second  process  is  usually  based  on  stationary 
equipment  such  as  in-pit  crushers  and  conveyor  belts,  aiming  to  deliver  the  material  to  the  mineral 
processing  plant  (mill).  The  high  capital  and  operating  costs  of  any  fleet  of  mobile  equipment  make  its 
proper  management  a  critical  component  in  mining.  The  frequent  scheduling  of  excavation  from  different 
locations,  due  to  the  variable  nature  of  a  mine's  geology,  results  in  a  very  dynamic  production  planning. 
When  drills,  shovels  and  trucks  move  to  different  locations  often  on  a  daily  basis,  then  planning  in  terms  of 
truck  allocations  and  shovel  positions  becomes  challenging.  The  daily  plan  must  account  for  the  required 
mine  outputs,  such  as  ore  quality  (grade),  as  well  as  ore  and  waste  rock  tonnages  in  order  to  maintain  the 
required  progress  in  excavation  and  feed  to  the  mill.  Quality  control  over  the  mill  feed  would  generally  be 
achieved  according  to  grade,  hardness  and  size  distribution. 

ISSUES  IN  MINE  PRODUCTION  CONTROL 

Fleet  management  is  one  of  the  most  important  issues  in  large  open  pit  mines  using  a  truck  and  shovel 
network.  Planners  must  ensure  that  the  trucks  are  allocated  according  to  the  long  term  plans  for  the  mine, 
the  variability  of  the  mineralogical  and  physico-mechanical  characteristics  of  the  material  handled,  and  the 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


146 


needs  of  the  mill.  This  sets  the  sources,  tonnages  and  destinations  of  rock  (ore  and  waste)  to  be  drilled, 
blasted  and  moved  over  time.  It  also  accounts  for  preventative  maintenance  (PM)  activity.  Daily  plans  are 
typically  formed  using  empirical  criteria  and  past  experience. 

The  first  main  issue  is  how  the  dispatch  system  is  used.  Most  dispatch  systems  (certainly  the  older 
systems)  for  truck-shovel  fleets  are  based  on  the  optimization  of  productivity.  When  such  systems  were 
designed,  it  was  assumed  that  with  optimum  productivity  comes  optimal  profit.  With  homogeneous  shovel 
costs  and  a  static  number  and  capacity  of  trucks,  this  may  be  the  case;  however,  considering  variable  shovel 
costs  and  flexible  truck  fleet  size,  designing  for  increased  profitability  may  be  more  lucrative.  Through 
economies  of  scale,  it  has  been  proven  that  larger  shovels  can  usually  move  ore  at  a  fraction  of  the  cost  of 
smaller  shovels.  Working  the  larger  shovels  more  than  the  smaller  shovels  therefore  should  result  in  lower 
cost  per  ton  of  rock  moved.  Dispatch  systems,  however,  typically  maximize  the  productivity  of  the  truck 
fleet  instead  of  minimizing  the  overall  costs  of  that  step  in  the  production  cycle.  The  assumption  that 
maximizing  truck  productivity  will  lead  to  optimum  plan  in  terms  of  profit  is  not  always  valid.  This 
assumption  also  assumes  that  all  the  shovels  should  be  used.  The  overhead  and  operating  costs  of  working 
a  shovel  for  an  entire  shift  when  there  is  no  need  to  utilize  that  particular  shovel  can  make  the  mine  plan 
sub-optimal. 

The  second  main  issue  is  that  the  benefits  of  blending  the  mill  feed  material  according  to  its  hardness  are 
rarely  recognized.  In  the  case  study  discussed  in  this  paper,  the  benefit  of  accounting  for  processing 
variables  in  the  daily  plan  is  recognized.  It  is  assumed  that  certain  blends  of  ore  hardness  will  maximize  the 
recovery  of  the  mineral  processing.  This  blending  issue  is  not  addressed  directly  by  the  truck  dispatch 
system.  It  is  addressed  in  the  daily  planning  through  rule-of-thumb  procedures,  unproven  assumptions  and 
experience.  A  more  automated  manner  of  blending  based  on  scientific  measurements  and  optimization 
algorithms  may  be  a  better  solution  in  the  long  term  and  will  ensure  consistent  quality.  Blending  is  a 
common  problem  that  can  be  easily  solved  using  linear  programming  [1],  yet  no  scientific  management 
procedures  are  used  in  mining  to  take  this  important  variable  into  account. 

This  paper  examines  a  simple  optimization  approach  developed  to  address  these  issues  in  terms  of  the 
truck-shovel  system  in  use  at  an  open  pit. 


PRODUCT  BLENDING  ISSUES 

Hardness  has  always  been  a  concern  at  this  open  pit  due  to  the  strength  of  the  ore  and  the  high  throughput 
demand  of  the  mill  that  offsets  the  low  grade  of  the  ore  feed.  Hardness,  until  recently,  had  only  been  taken 
into  account  by  visual  examination  in  the  field  by  mine  geologists.  A  hardness  index  was  developed  to 
predict  the  particular  ore&  throughput  when  milled.  A  single  hardness  value  was  awarded  to  an  entire 
blasted  ore  bench.  Recently  acquired  drill  performance  monitoring  technologyhas  revealed  that  hardness 
can  vary  dramatically  within  a  single  ore  bench.  This  new  technology  can  enable  hardness  to  be  integrated 
into  planning  in  a  more  consistent  and  scientific  manner. 


EQUIPMENT  ISSUES 

The  open  pit  currently  owns  and  operates  35  trucks.  Generally,  25-30  trucks  are  used  every  day,  while  the 
rest  of  the  fleet  is  either  under  PM  or  idle.  On  average,  thirty  truck  drivers  are  used  every  shift.  Many 
operators  are  trained  for  other  types  of  equipment  such  as  graders  or  dozers.  Extra  truck  drivers  usually 
operate  service  equipment. 

The  mine  operates  7  shovels:  3  of  which  are  ore  shovels  and  the  rest  are  used  in  waste  (this  parameter  can 
alternate,  depending  on  which  shovel  is  closest  to  the  source  of  blasted  rock).  Due  to  the  PM  schedule,  at 
least  one  shovel  is  down  for  maintenance,  therefore  only  6  shovels  are  usually  in  operation  at  a  single  point 
in  time.  The  shovels  differ  in  size  and  therefore  have  different  costs  per  ton.  The  most  cost-effective 
shovels  should  be  those  with  the  highest  utilization.  This  is  not  the  case,  for  several  reasons.  Firstly,  the 
production  constraints  may  require  that  a  particular  bench  be  cleared  to  maintain  the  long-term  plan. 


147 


Secondly,  hardness  and  grade  constraints  imposed  by  the  mill  sometimes  take  precedence;  thirdly,  the 
dispatch  system  optimizes  the  truck  allocation  based  on  truck  productivity,  not  shovel  costs. 

The  simple  optimization  approach  adopted  as  a  spreadsheet  tool  indicated  that  too  many  shovels  were  used 
in  the  daily  operations.  The  mine  employs  6  shovel  operators  per  shift,  yet  does  not  have  the  flexibility 
(due  to  union  constraints)  to  change  the  shovel  allocation  or  operator  requirements  on  an  hourly  basis. 
Some  shovel  operators,  however,  could  be  used  on  service  equipment,  similar  to  extra  truck  operators. 


PRODUCTION  CONSTRAINTS 

The  mine  production  plan  must  fulfill  daily  targets  in  terms  of  stripping  waste  and  milling  ore.  The  mill 
was  designed  to  operate  most  efficiently  at  a  particular  hardness  and  grade  of  feed,  therefore  the  daily 
requirements  for  the  ore£  grade  is  also  a  production  requirement.  The  numerical  values  of  the  daily 
requirements  used  in  the  model  are  shown  at  the  lower  left  side  of  the  spreadsheet  screen  capture  shown  in 
figure  1,  at  the  end  of  this  document.  The  table  entitled  Production  Requirements’ shows  variables  such  as 
bre  +’  and  bre-‘.  These  values  represent  the  possible  upper  and  lower  limits  of  ore.  For  example,  the 
highest  quantity  of  ore  that  can  be  produced  is  144000  tons/day  (140000  tons/day  +  (2000  tons/shift  x  2 
shifts/day)).  The  production  constraints  and  key  variables  are  also  listed  in  the  lower  left  comer  of  the 
screen  capture. 


DEVELOPING  A  MINE  PRODUCTION  MODEL 

Since  the  production  model  is  graphically  large,  in  order  to  facilitate  comprehension  and  to  avoid 
accidental  formulae  elimination  within  a  cell,  the  fonts  and  some  cells  can  be  color-coded.  Red  numbers 
can  appear  as  those  for  manual  input,  whilst  black  letters  are  for  variable  values  calculated  from  previous 
tables.  The  spreadsheet  model  for  this  problem  was  developed  as  follows: 

Inputs 

All  key  variables  were  entered  in  the  Constraints’ and  Production  Statistics’ tables  viewed  at  the  bottom 
left  comer  of  the  spreadsheet.  These  key  variables  are  used  in  the  spreadsheet  for  various  calculations  and 
as  constraints.  The  following  also  explains  some  of  the  important  aspects  of  the  model.  Specific  functions 
or  equations  within  the  software  are  not  discussed  since  such  a  model  could  be  developed  using  any 
modem  spreadsheet  or  other  software  tools  with  linear  programming  abilities. 

The  top  five  rows  (cells  B3:C7)  are  values  that  must  be  entered  according  to  the  current  situation  in  the  pit. 
These  can  be  entered  automatically  through  a  macro,  within  the  spreadsheet  software,  that  reads  these 
statistics  directly  from  the  dispatch  computer.  The  top  row  is  the  identification  number  of  a  particular 
shovel.  The  number  of  the  shovel  is  inversely  representative  of  its  age.  For  example,  shovel  12  is  the 
oldest  while  shovel  1 8  is  the  newest  (largest  and  cheapest  to  operate).  The  next  row  is  the  location  of  the 
shovel.  The  third  row  indicates  the  hardness  index  of  the  current  volume  of  blasted  material  available  to  the 
particular  shovel.  As  discussed  previously,  hardness  can  be  included  in  the  model;  however,  it  will  not  be 
explored  in  this  version  since  the  mine  currently  has  no  direct  policy  governing  blending  objectives.  This 
model  will  deal  only  with  grade  and  production  requirements  as  its  planning  constraints. 

The  operating  costs  per  hour  are  listed  in  the  range  C8:P8.  The  shovel  load  time  and  truck  rates  in 
tons/hour  are  input  using  the  macro  that  takes  the  variables  directly  off  the  dispatch  computer.  These  rates 
can  be  used  to  design  the  next  dayk  daily  plan  if  the  general  statistics  are  expected  to  be  the  same.  For 
example,  if  shovel  12  moves  to  another  location,  the  truck  productivity  may  change,  therefore  the  optimum 
plan  is  no  longer  the  same.  These  values  can  also  be  used  to  determine  the  efficiency  of  the  planning 
procedure.  For  example,  the  resulting  truck  allocations  for  that  day  can  be  input  into  the  changing  cells,  to 
be  compared  with  the  spreadsheet  toolfe  optimized  allocation,  based  on  the  previous  dayk  production 
variables. 

The  changing  variables  are  allocated  to  cell  range  C11:P12.  These  are  the  cells  that  the  linear  program 
solver,  imbedded  within  Excel,  changes  in  order  to  achieve  the  optimal  solution.  Rows  18  and  19  show  the 


148 


constraint  that  limits  the  number  of  truck  loads  per  hour  that  are  allocated  to  a  particular  shovel.  The 
algorithm  can  be  presented  by  the  equation  1 . 

.  ,  „  ,  ,  .  t  ..  ,  ,  shovel  load  time  (min)  ^  trucks  allocatedx  truck  rate  (t/hr)  , 

max  loads/hrlruck  >  actual  loads/hr ./trk  — > - - - -  > - - - -  1 . 

60(min/hour)  truck  capacity 

It  is  recognized  that  a  shovel  may  spend  half  a  day  loading  ore,  and  the  rest  of  the  day  loading  waste.  This 
would  result  in  two  different  hiaximum  truck  loads’ per  hour.  It  was  assumed  the  highest  truckloads  per 
hour  would  be  the  maximum  of  the  two  possibilities  . 

Results 

Basic  and  array  multiplication  equations  are  used  to  calculate  values  such  as  dry  metric  tons  of  ore  and 
waste.  These  values  are  also  used  as  constraints  since  the  mine  wants  to  maximize  ore  recovery  while 
maintaining  the  stripping  ratio.  Average  grade  is  calculated  using  the  weighted  average  and  recovery  is 
calculated  using  a  macro  developed  by  the  milling  department  (developed  several  years  ago  using  empirical 
data).  The  variables  used  in  the  macro  are  the  hardness  values  and  grade  of  that  particular  ore.  The  revenue 
is  calculated  by  equation  2. 

dry  metric  tons/dayx  grade  (%Cu)x  recovery  x$/ton  of  concentrate  =  revenue  to  mine  2. 

Truck,  shovel  and  milling  costs  are  calculated  using  the  operating  and  milling  costs  (per  hour)  already 
developed  by  the  mine.  Therefore  total  profit  is  simply  the  revenue  less  the  costs. 

Solver  Setup 

To  solve  the  linear  problem,  the  solver  imbedded  within  Microsoft  Excel  was  used.  The  main  steps 
involved  in  using  the  solver  are  discussed  here.  The  main  elements  of  any  linear  programming  tool 
(objective,  changing  variable,  and  constraints)  will  be  very  similar  to  what  follows: 

1.  Objective.  Maximize  profit  by  selecting  the  profit  solution  (cell  P34) 

2.  Changing  Cells.  For  this  particular  model,  the  number  of  trucks  allocated  to  the  shovels  are  the 
changing  variables.  (Cl  1  to  PI  1  are  the  changing  cells) 

3.  The  following  constraints  are  required  for  this  model: 

•  Maximum  number  of  truck  loads  per  hour  •  Ideal  grade. 

•  Maximum  dry  metric  tons  of  our  per  day  •  Maximum  number  of  trucks  available 

•  Minimum  waste  stripping  in  tons  per  day 


DISCUSSION 

The  case  study  implies  that  the  mine  should  allocate  only  20  trucks  and  4  shovels  for  this  particular  mining 
sequence.  Although  not  obvious,  fractions  of  a  truck  per  hour  can  be  considered  in  the  daily  plan.  For 
example,  if  a  single  truck  alternates  between  shovel  12  and  15  then  the  0.6  and  0.5  fractions  can  be 
accounted  for.  Major  financial  savings  could  accrue  by  disregarding  the  traditionally  held  assumption  that 
all  shovels  must  be  used.  Secondary  benefits,  such  as  better  services  due  to  the  increased  staffing  of  extra 
shovel  and  truck  operators  on  service  duty,  may  add  further  savings  that  cannot  be  calculated  directly. 

Certain  available  blasted  material  at  different  locations  may  need  to  be  prioritized  by  the  mine  plan  if 
required.  A  constraint  can  be  added  that  ensures  the  shovel  at  a  particular  ore  bench  loads  out  the  required 
amount.  The  solution  shows  that  the  copper  output  is  maximized  and  that  the  cheapest  shovels  to  operate 
are  fully  utilized.  This  can  be  accounted  for  in  the  automatic  dispatch  system  by  increasing  the  allowable 
average  queue  time  for  the  most  productive  shovels.  This  will  insure  their  preferred  utilization. 

As  monitoring  technology  is  introduced  into  the  mine,  the  ore  characteristics  obtained  can  be  used  to 
improve  the  mine  design.  If  computerized  daily  planning  is  further  developed,  then  the  planning  software 
may  be  able  to  account  for  blending  for  optimum  mill  throughput. 


149 


A  great  deal  of  emphasis  is  currently  being  placed  on  4D  CAD  coupled  with  economic  models[2],  The  long 
range-planning  engineer  currently  uses  these  tools  at  this  mine.  A  short-term  planning  tool  should  be 
developed  to  allow  optimized  daily  plans.  The  spreadsheet  solver  described  in  this  paper  was  developed  in 
under  four  hours.  The  previous  shifts  truck  allocation  decision  was  entered  into  the  spreadsheet  and 
revealed  that  cost  savings  of  $350,000  would  have  been  possible  if  the  optimum  schedule  was 
implemented.  It  should  be  noted  that  no  bench  clearing  constraints  were  input,  therefore  the  true  cost 
savings  may  have  been  less.  From  this  spreadsheet  tool,  the  mine  has  identified  the  likely  over-capacity  of 
equipment  and  personnel  being  employed  daily,  although  the  impact  on  production  flexibility  of  reducing 
labor  is  not  known.  This  work  shows  that  with  little  cost  and  time,  a  spreadsheet  tool  can  be  developed  in- 
house  for  more  scientific  and  cost  effective  mine  production  management.  This  work  has  also  shown  that 
the  assumption  that  the  mine  should  use  all  shovels  for  this  production  phase  is  false.  The  optimum 
solution  does  calls  for  using  only  5  shovels.  This  would  allow  the  mine  to  avoid  the  operating, 
maintenance  and  overhead  costs  involved  in  operating  that  particular  shovel. 


CONCLUSION 

Scientific  methods  of  planning  have  been  advocated  since  the  early  part  of  this  century[3].  Mines  have 
used  queuing  theory  and  other  operations  management  tools  during  the  initial  design  phase;  however,  very 
little  is  used  in  daily  operations.  The  availability  of  modem  spreadsheet  software  and  user-friendly 
management  texts  (many  that  directly  involve  spreadsheet  software)  [1,4]  make  intelligent  production 
management  accessible  to  even  the  smallest  mines.  Questioning  long  held  assumption  would  also  allow 
mines  to  identify  other  sources  of  cost  savings. 


REFERENCES 

1.  Wayne  L.  Winston  and  S.  Christian  Albright,  1998.  Practical  Management  Science:  Spreadsheet 
Modeling  and  Applications,  New  York:  Duxbury  Press 

2.  Adam  Wheeler,  1997.  The  Shape  of  Things  to  come  at  Bjorkdal.  Mining  Magazine,  August,  128-130 

3.  Frederick  Winslow  Taylor,  1967.  The  Principles  of  Scientific  Management.  New  York:  W.  W.  Norton 
&  Company,  (first  published  in  191 1  by  Taylor) 

4.  Jeffery  D.  Camm  and  James  R.  Evans,  1996.  Management  Science;  Modeling,  Analysis  and 
Interpretation.,  South-Western  College  Publishing, 


Fig.  1.  Solver  Worksheet  and  Solution. 


50 


151 


Intelligent  Quality  Control 
for  Manufacturing  in  the  Food  Industry 
using  a  New  Fuzzy-Fractal  Approach 

Oscar  Castillo  and  Patricia  Melin 

Computer  Science  Department,  Tijuana  Institute  of  Technology, 
P.O.  Box  4207,  Chula  Vista  CA  91909,  USA, 

Email:  ocastillo@mail.tii.cetvs.mx  emelin@mail.tii.cetvs.mx 


ABSTRACT 

In  this  paper  we  describe  a  new  method  to  perform  automated  quality  control  in  the  food  industry,  based  on 
the  use  of  a  new  fuzzy-fractal  approach.  Traditional  quality  control  in  the  food  industry  consists  mainly  of 
microbiological  and  chemical  techniques  performed  on  samples  of  food  extracted  from  production  lines. 
Traditionally,  the  goal  of  the  quality  control  department  in  the  food  industry  has  been  the  application  of  the 
minimal  number  of  microbiological  and  chemical  techniques  to  the  samples  of  food,  so  as  to  have  a 
decision  on  the  quality  of  the  production  as  quickly  as  possible.  The  main  idea  in  this  paper  is  to  combine 
the  use  of  the  fractal  dimension  as  a  measure  of  classification  of  microorganisms  and  chemicals  with  the 
use  of  fuzzy  logic  to  simulate  the  expert  evaluation/decision  process  of  production^  quality.  We  have 
developed  an  intelligent  system  that  is  able  to  perform  automated  quality  control  in  the  food  industiy. 


INTRODUCTION 

Traditional  quality  control  in  the  food  industry  consists  of  a  long  sequence  of  microbiological  and  chemical 
lab  techniques  that  have  to  be  performed  on  samples  of  food  extracted  from  production  lines.  Traditionally, 
the  goal  of  the  quality  control  department  has  been  the  application  of  the  minimal  number  of 
microbiological  and  chemical  techniques  to  the  samples  of  food,  so  as  to  have  a  decision  on  the  quality  of 
the  production  as  soon  as  possible.  The  goal  of  the  microbiological  and  chemical  techniques  is  the 
identification  of  possible  harmful  microorganisms  and  toxic  chemicals  for  the  final  consumers  of  the  food 
[11,  12].  This  is  the  information  that  is  evaluated  at  the  end,  by  the  human  experts  in  quality  control,  to 
make  the  final  decision  about  the  quality  of  the  production. 

In  this  paper  we  describe  a  new  method  to  perform  automated  quality  control  in  the  food  industry,  using 
fuzzy  logic  techniques  [1,  8]  and  fractal  theory  [10].  We  also  show  in  this  paper  howto  implement  this  new 
method  as  a  computer  program  to  really  achieve  the  goal  of  automated  quality  control  in  practice.  We  use 
fractal  theory  in  our  new  method  to  minimize  the  number  of  microbiological  and  chemical  techniques  that 
are  needed  to  make  the  identifications  of  harmful  microorganisms  and  toxic  chemicals.  We  use  a  method 
for  the  identification  of  microorganisms  based  on  the  use  of  the  fractal  dimension,  developed  by  the  authors 
[3],  to  eliminate  the  need  of  applying  a  long  sequence  of  microbiological  techniques  to  the  samples  of  food. 

This  method  of  identification  uses  the  fractal  dimension  as  a  measure  of  classification  of  the 
microorganisms  and  can  greatly  reduce  the  time  needed  to  identify  possible  harmful  microorganisms  for 
the  consumers.  The  computer  program  contains  a  module  to  perform  this  fractal  identification,  which  in 
turn  enables  automated  identification  of  microorganisms  minimizing  the  use  of  microbiological  techniques. 
On  the  other  hand,  we  use  fuzzy  logic  techniques  to  simulate  the  expert  evaluation/decision  process  to 
obtain  the  quality  of  the  production.  The  process  of  quality  evaluation  is  simulated  in  the  computer  program 
using  as  input  the  information  about  the  identifications  of  microorganisms  and  chemicals,  and  then  by 
applying  heuristics  and  statistical  calculations  to  decide  on  the  degree  of  quality  of  the  production.  The 
computer  program  contains  an  "intelligent"  module  to  perform  this  "expert"  simulation,  which  in  turn 
results  in  automated  evaluation  of  production  quality.  In  conclusion,  we  can  say  that  combining  the  use  of 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


152 


both  fuzzy  logic  techniques  and  fractal  theory  in  a  computer  program  we  can  achieve  the  desired  goal  of 
Automated  Quality  Control  in  the  Food  Industry  (AQCF1). 

The  use  of  fuzzy  logic  and  fractal  theory  increases  the  efficiency  (in  accuracy  and  in  time)  of  the  quality 
control,  because  the  computer  program  has  the  mathematical  algorithms  (for  the  fractal  dimension)  needed 
to  identify  possible  harmful  microorganisms  and  toxic  chemicals  minimizing  the  use  of  microbiological  or 
chemical  techniques  for  the  identification,  and  also  because  the  computer  program  has  the  knowledge  to 
decide  at  the  end  on  the  final  quality  of  the  food  production.  In  this  paper  the  authors  have  successfully 
generalized  their  previous  work  on  this  matter  [5,  6],  by  using  a  Fractal  Module  to  perform  automated 
identification  of  microorganisms  and  chemicals  and  by  developing  an  Expert  Module  for  evaluation  of  the 
quality  of  the  production  using  fuzzy  logic  techniques. 


DESCRIPTION  OF  THE  NEW  METHOD  FOR 
AUTOMATED  QUALITY  CONTROL 

The  new  method  for  automated  quality  control  in  the  food  industry  consists  mainly  of  two  parts:  one  part 
performs  the  automated  identification  of  microorganisms  and  chemicals  by  using  the  fractal  dimension,  and 
the  second  part  performs  an  automated  evaluation  on  the  quality  of  the  production  by  using  SC  techniques 
and  the  information  obtained  in  part  one.  We  show  in  Figure  1  how  the  new  method  works,  beginning  with 
the  samples  of  food  extracted  from  production  lines  and  ending  with  the  final  evaluation  of  the  production 
quality.  We  describe  below  in  more  detail  the  two  main  parts  of  the  new  method  for  AQCF1. 


Geome 

digital 

identi- 

Produc- 

Samples 

trical 

Image 

Fractal 

Station 

AI 

techniques 

method 

turn 

of  food 

forms 

Processing 

info 

Method 

chemicals 

bacteria 

Quality 

Fig.  1.  New  Method  for  Automated  Quality  Control  in  the  Food  Industry. 

The  samples  of  food  are  extracted  randomly  from  production  lines  (this  is  the  only  part  that  is  not 
automated)  and  prepared  for  identification.  Then  a  Laser  Scan  Microscope  is  used  to  digitize  the 
geometrical  forms  of  bacteria  colonies.  The  digitized  information  is  then  used  as  input  for  the  method  of 
identification  (using  the  fractal  dimension).  Finally,  the  identifications  of  bacteria  and  chemicals  are  used 
as  information  by  a  knowledge  base  to  decide  on  the  general  quality  of  the  production. 


DESCRIPTION  OF  THE  METHOD  FOR  IDENTIFICATION  OF  MICRO¬ 
ORGANISMS  AND  CHEMICALS  USING  THE  FRACTAL  DIMENSION 

We  describe  briefly  in  this  section  a  new  method  for  the  identification  of  microorganisms  for  the  food 
industry,  developed  by  the  authors  [2],  that  is  based  on  the  use  of  the  fractal  dimension  [10],  This  method 
uses  the  fractal  dimension  to  make  a  unique  classification  of  the  different  types  of  microorganisms,  because 
it  is  a  known  experimental  fact  that  the  colonies  of  different  types  of  bacteria  have  different  geometrical 
forms.  The  problem  is  then  of  finding  a  one  to  one  map  between  the  different  types  of  bacteria  and  their 
corresponding  fractal  dimension,  in  this  way  obtaining  a  unique  method  of  identification  of 
microorganisms  for  the  food  industry.  The  first  step  in  obtaining  this  map  is  to  find  experimentally  (in  the 
microbiological  lab)  the  different  geometrical  forms  for  the  bacteria.  The  second  step  is  to  calculate  the 
corresponding  fractal  dimensions  for  the  different  types  of  bacteria.  This  fractal  dimension  can  be 
calculated  for  a  selected  type  of  bacteria  with  several  samples,  to  obtain  as  a  result  a  statistical  estimation  of 
the  dimension  and  the  corresponding  error  of  the  estimation. 


153 


The  method  for  identification  of  microorganisms  using  the  fractal  dimension  can  be  stated  mathematically 
in  the  following  form:  let  M  be  a  one  to  one  map  between  the  sets  I  and  D  ,  where  the  set  I  can  be 

called  set  of  identifications  of  microorganisms  and  the  set  can  be  called  set  of  fractal  dimensions.  The 

set  of  identifications  can  be  as  follows: 

Im  =  {  Staphylococcus^,  Streptococcusfccalis,  Pseudomona ,  ureoginosa, ...} 

and  the  set  of  fractal  dimensions  can  be  as  follows: 


d =  t  d_, D, D_  ...} 


pa 


where  D  is  the  fractal  dimension  of  the  Staphylococcus  aureus  bacteria,  D  is  the  fractal  dimension  of 

sa  sf 

the  Streptococcus  fecalis  bacteria,  D;  is  the  fractal  dimension  of  the  Pseudomona  aureoginosa  bacteria  and 
so  on.  The  fractal  dimension  of  an  object  can  be  defined  approximately  as  follows: 

d  =  [lnN(r)j  /  [ln(l/r>] 

where  N(r)  is  the  number  of  boxes  covering  the  object  and  r  is  the  size  of  the  box.  In  this  case,  the  object  is 
the  colony  for  a  particular  bacteria.  An  approximation  of  the  fractal  dimension  can  be  obtained  by  counting 
the  number  of  boxes  covering  the  object  for  different  r  sizes  and  then  performing  a  logarithmic  regression 
to  obtain  d  (box  counting  algorithm). 


DESCRIPTION  OF  THE  METHOD  TO  PERFORM 
AUTOMATED  EVALUATION  OF  PRODUCTION  QUALITY 

We  describe  briefly  in  this  section  the  method  to  perform  the  evaluation  of  production  quality  using  SC 
techniques.  This  method  simulates  the  expert  evaluation/decision  process  involved  in  obtaining  the  degree 
of  quality  of  the  production.  This  method  uses  as  input  the  information  obtained  by  the  "Fractal  Method", 
i.e.  the  identification  of  microorganisms  and  chemicals,  and  then  applies  heuristics  and  statistical 
calculations  (implemented  as  fuzzy  rules)  to  decide  on  the  quality  of  the  production.  This  method  uses 
expert  knowledge  to  decide  if  a  microorganism  can  be  harmful  to  the  consumer  or  if  a  chemical  can  be 
toxic  to  the  consumer.  Also,  the  method  uses  expert  knowledge  to  decide  on  the  degree  of  quality  of  the 
production.  Both  kinds  of  expert  knowledge  can  be  implemented  in  MATLAB  as  a  set  of  fuzzy  rules 
(knowledge  base)  for  the  computer  program.  The  choice  of  MATLAB  is  because  of  its  symbolic  and 
numeric  manipulation  features  and  also  because  it  is  a  good  language  for  developing  prototypes. 


The  use  of  fuzzy  logic  in  manufacturing  applications  has  been  well  recognized  [1,9]  and  many  applications 
have  been  developed.  In  this  case,  we  came  to  the  conclusion  that  the  best  way  to  convey  the  information 
about  the  quality  level  of  a  manufactured  food  product  was  using  fuzzy  sets  [8].  Also,  we  think  that  the  best 
way  to  reason  with  uncertainty  in  this  case  is  using  fuzzy  logic.  In  a  prior  prototype  developed  by  the 
authors  [5]  we  didn't  consider  using  fuzzy  logic  in  the  simulation  of  the  expert  decision  process.  However, 
when  we  considered  expanding  the  number  of  microorganisms  and  chemicals  for  the  system,  then  we  run 
into  problems  with  the  consistency  of  the  knowledge  base.  We  realized  then,  the  need  to  consider  the  use  of 
reasoning  with  uncertainty  and  also  as  a  result  of  this  using  a  rule  base  consistent  with  this  approach.  Our 
choice  for  reasoning  with  uncertainty  was  fuzzy  logic,  considering  the  success  that  this  theory  has  achieve 
in  the  fields  of  manufacturing  and  engineering. 


DESCRIPTION  OF  THE  INTELLIGENT  SYSTEM 

The  modules  of  the  intelligent  system  are:  the  Expert  Module,  the  Fractal  Module,  the  Interface  and  the 
Inference  Engine.  The  description  of  the  Expert  Module  and  the  Fractal  Module  is  given  below,  because  it 
is  very  interesting  from  the  point  of  view  of  the  application  to  manufacturing  in  the  food  industry.  We  also 
give  a  description  of  the  implementation  strategies  used  in  developing  the  computer  system. 


154 


DESCRIPTION  OF  THE  FRACTAL  MODULE 

The  Fractal  Module  consists  of  a  computer  program  that  is  an  implementation  of  the  method  for  the 
identification  of  microorganisms  and  chemicals  using  the  fractal  dimension  described  before.  This 
computer  program  uses  the  geometrical  forms  of  the  colonies  of  microorganisms,  obtained  by  the  Interface 
of  the  system  from  the  samples  of  food  to  estimate  the  fractal  dimension  (box  dimension)  using  a  known 
algorithm  [10],  and  then  compares  this  value  with  the  data  base  of  known  fractal  dimensions  and  their 
corresponding  identifications,  to  arrive  to  the  conclusion  of  which  microorganisms  are  present  in  the 
samples  of  food.  This  computer  program  also  uses  the  geometrical  forms  of  the  spectrums  of  unknown 
chemical  compounds  in  the  samples  of  food,  to  estimate  their  fractal  dimensions  and  then  compares  this 
value  with  the  data  base  of  known  chemical  identifications,  to  arrive  to  the  conclusion  of  which  chemical 
compounds  are  present  in  the  samples.  In  Table  1  we  show  part  of  the  data  base  of  fractal  dimensions  for 
different  types  of  microorganisms  obtained  by  extensive  prior  experimental  microbiological  work  done  by 
the  authors  [2]. 


Table  1.  Data  base  for  the  classification  of  microorganisms 


Number 

Fractal  Dimension 

Microorganism  Identification 

1 

D  ±  e 

sa  sa 

Staphylococcus  aureus 

2 

D  ±e 

sf  sf 

Streptococcus  fecalis 

3 

D  +  e 

pa  pa 

Pseudomona  aureoginosa 

4 

D  ±  e 

St  St 

Salmonella  typhi 

5 

D  ±  e 

ec  ec 

Echerichia  coli 

In  this  table  D.  denotes  the  average  fractal  dimension  expected  for  a  particular  type  of  microorganism  and  e 

denotes  the  corresponding  statistical  error  associated  with  the  identification.  We  can  interpret  then  the  value 
D.  ±  e.  as  the  confidence  interval  for  each  particular  identification  shown  in  the  table. 

DESCRIPTION  OF  THE  EXPERT  MODULE 

The  Expert  Module  of  the  computer  system  is  an  implementation  of  the  method  to  perform  automated 
evaluation  of  production  quality  described  before.  The  Expert  Module  uses  the  information  given  by  the 
Fractal  Module  to  obtain  the  quality  of  the  production,  i.e.,  the  Expert  Module  uses  the  identifications  of 
microorganisms  and  chemicals  found  in  the  samples  of  food  to  decide  if  the  quality  of  the  production  meets 
the  requirements  of  acceptance.  The  knowledge  base  of  the  Expert  Module  consists  of  a  set  of  fuzzy  rules 
containing  the  knowledge  of  the  human  experts  for  the  domain  of  quality  control  in  the  food  industry.  The 
knowledge  base  consists  of  two  parts:  one  part  contains  the  knowledge  to  decide  if  a  microorganism  is 
harmful  to  the  consumer  or  if  a  chemical  is  toxic  to  the  consumer,  the  second  part  contains  the  knowledge 
to  decide  on  the  degree  of  quality  of  the  production. 

Suppose  that  n  samples  of  food  are  extracted  from  production  lines  during  the  manufacturing  process,  and 
that  the  Fractal  Module  makes  the  corresponding  identifications  of  microorganisms  and  chemicals  found  in 
the  samples.  The  output  of  the  Fractal  Module  will  be  a  list  of  microorganisms  and  the  percentages  of  the 
samples  that  they  were  found  in  and  the  same  for  the  chemicals.  For  example,  in  the  case  that  only  three 
bacteria  were  found  the  output  can  be  written  as  follows: 

Output  =  {(BacteriaX,  %samplesX),  (BacteriaY,  %samplesY),  (BacteriaZ,  %samplesZ)} 

With  the  percentage  of  samples  for  each  Bacteria,  we  can  calculate  the  Total  Percentage  of  Bacteria  that 
can  cause  Infection  or  that  can  cause  Intoxication.  For  example,  if  in  the  above  case  only  Bacteria  X  and 
Bacteria  Y  can  cause  infection,  then: 

%Bacteria_Infection  =  %samplesX  +  %samplesY 


Using  as  variables  the  Total  Percentage  of  Bacteria  that  can  cause  Infection  and  the  Total  Percentage  that 
can  cause  Intoxication,  we  can  classify  the  "Microbiological  Evaluation  of  the  Production"  as  one  of  the 
three  following  fuzzy  sets:  GOOD,  REGULAR  and  BAD.  A  similar  classification  can  be  done  with  the 
"Chemical  Evaluation  of  the  Production",  using  the  fuzzy  sets:  GOOD,  REGULAR  and  BAD.  In  our 
implementation  the  membership  functions  were  found  using  heuristics  from  the  experts  in  microbiology 
and  chemistry  for  the  food  industry. 

We  classify  "Production  Quality"  using  four  fuzzy  sets:  EXCELLENT,  GOOD,  REGULAR  and  BAD.  This 
classification  depends  on  two  variables:  Microbiological  Evaluation  and  Chemical  Evaluation.  We  show  in 
Figure  2  the  membership  functions  for  the  values  of  the  "Quality"  linguistic  variable.  The  membership 
functions  were  defined  in  the  membership  function  editor  of  the  Fuzzy  Logic  Toolbox. 


Fig.  2.  Membership  functions  for  the  linguistic  values  of  the  quality  variable. 


We  show  in  Figure  3  the  fuzzy  rule  base  for  the  prototype  intelligent  system  developed  in  the  Fuzzy  Logic 
Toolbox  of  the  MATLAB  programming  language. 


Rule  Editor:  loodqual 


■ril  is  bad]  then  |  qualify  is  bad]  fl  1 


1 .  If  (micro-eval  is  good]  and  (chem-eval  is  goodj  then  (qualify  is  excellent)  (1 ) 

2.  If  (micro-eval  is  good)  and  (chem-eval  is  regular)  then  (qualify  is  good)  (1) 

3.  If  (micro-eval  is  good]  and  (chem-eval  is  bad]  then  (qualify  is  regular)  (1) 

4.  If  (micro-eval  is  regular)  and  (chem-eval  is  good)  then  (qualify  is  goodj  (1) 

5.  If  (micro-eval  is  regular]  and  (chem-eval  is  regular)  then  (qualify  is  regular)  (1) 

6.  If  (micro-eval  is  regular]  and  (chem-eval  is  bad)  then  (qualify  is  bad)  (1) 

7.  If  (micro-eval  is  bad)  and  (chem-eval  is  good)  then  (qualify  is  regular)  (1) 

8.  If  (micro-eval  is  bad)  and  fchem-eval  is  regular]  then  (qualify  is_bad)_[TJ _ 


Fig.  3.  Fuzzy  rule  base  in  the  rule  editor  of  fuzzy  logic  toolbox. 


We  show  in  Figure  4  the  non-linear  surface  for  the  problem  of  quality  evaluation  using  as  input  variables: 
microbiological  evaluation  and  chemical  evaluation.  The  three-dimensional  surface  represents  the  non¬ 
linear  fuzzy  model  for  the  problem. 


156 


Fig.  4.  Non-linear  surface  for  quality  evaluation. 


CONCLUSIONS 

We  have  developed  a  computer  system  for  automated  quality  control  of  the  manufacturing  process  for  the 
food  industry.  The  computer  system  is  an  implementation  of  a  new  method  developed  by  the  authors  for 
automated  quality  control,  based  on  the  use  of  SC  techniques  and  fractal  theory.  In  a  prior  prototype  system 
developed  by  the  authors  [3]  the  use  of  SC  was  only  for  the  selection  of  the  microbiological  techniques 
needed  to  identify  the  microorganisms  present  in  the  samples  of  food  and  this  is  only  part  of  the  problem  in 
quality  control.  Then  in  other  prototypes  developed  by  the  authors  [4,  5,  6]  we  introduced  the  fractal  theory 
method  to  help  in  the  identification  of  microorganisms.  In  the  computer  system  presented  now  in  this 
paper,  the  Fractal  Module  eliminates  the  need  of  using  microbiological  techniques  for  the  identification  of 
microorganisms.  Also,  the  Fractal  Module  identifies  chemicals  present  in  the  samples  of  food.  In  the 
system  described  in  this  paper,  the  use  of  SC  techniques  is  for  the  simulation  of  the  decision  process  of 
finding  if  a  microorganism  can  be  harmful  or  a  chemical  can  be  toxic  for  the  consumers.  Also,  the  system 
uses  fuzzy  logic  techniques  for  the  simulation  of  the  evaluation  process  of  the  quality  of  the  production 
using  the  information  given  by  the  Fractal  Module. 


REFERENCES 

1.  Badiru,  A.B.,  1992.  Expert  Systems  Applications  in  Engineering  and  Manufacturing,  Prentice-Hall. 

2.  Castillo,  O.,  Melin,  P.,  1994.  Developing  a  new  method  for  the  identification  of  microorganisms  for  the  food 
industry  using  the  fractal  dimension,  Journal  of  Fractals,  2(3),  457-460. 

3.  Castillo,  O.,  Melin,  P.,  1994.  An  Intelligent  System  for  the  Identification  of  microorganisms  for  quality  control  in 
the  food  industry,  Proceedings  of  the  Ninth  International  Conference  on  Applications  of Artificial  Intelligence  in 
Engineering,  Wessex  Institute  of  Technology,  U.K.  133-140. 

4.  Castillo,  O.,  Melin,  P.,  1995.  Automated  Quality  Control  in  the  Food  Industry  combining  Artificial  Intelligence 
Techniques  and  Fractal  Theory,  Proceedings  of  the  Tenth  International  Conference  on  Applications  of  Artificial 
Intelligence  in  Engineering,  Wessex  Institute  of  Technology,  U.K.,  109-1 1 8. 

5.  Castillo,  O.,  Melin,  P.,  1995.  QUACONTRA:  Quality  Control  Training  in  the  Food  Industry  using  an  Intelligent 
Tutor,  Proceedings  of  the  Ninth  International  Conference  on  Industrial  an  Engineering  Applications  of  Artificial 
Intelligence  and  Expert  Systems,  Gordon  and  Breach  Publishers,  Australia,  835-844. 

6.  Castillo,  O.,  Melin,  P.,  1996.  Automated  Quality  Control  for  Manufacturing  in  the  Food  Industry  using  Fuzzy 
Logic  and  Fractal  Theory,  Proceedings  ofDKSME'96,  Tempe,  Arizona,  USA,  349-360. 

7.  Castillo,  O.,  Melin,  P.,  1997.  Intelligent  Quality  Control  for  Manufacturing  in  the  Food  Industry  using  Fuzzy 
Logic  Techniques  and  Fractal  Theory,  Proceedings  AISC'97,Acta  Press,  Banff,  Canada,  100-103. 

8.  Kosko,  B.,  1992.  Neural  Networks  and  Fuzzy  Systems,  Prentice-Hall. 

9.  Kusiak,  A.,  1992.  Intelligent  Design  in  Manufacturing,  John  Wiley  and  Sons. 

10.  Mandelbrot,  B.,  1983.  The  Fractal  Geometry  of  Nature,  W.  H.  Freeman  and  Company. 

1 1.  Pelczar,  M.J. ,  Reid,  R.D.,  1982.  Microbiology,  McGraw  Hill. 

12.  Pettipher,  G.L.,  Rodriguez,  M.V.,  1987.  Rapid  Enumeration  of  Microorganisms  in  Food  by  the  Direct 
Epifluorescent  Filter  Technique,  Applied  and  Environmental  Microbiology,  52(3),  115-117. 


157 


Design  Tool  for  Assessing  Manufacturing 
and  Environmental  Impact 

Daniel  Rochowiak1,  Sherri  Messimer2,  Phillip  A.  Farrington2, 
Dawn  Russell2,  Raymond  D.  Harrell3  and  Daniel  A.  Holder3. 


'Department  of  Computer  Science 
Department  of  Industrial  and  Systems  Engineering 
and  2Engineering  Management 
The  University  of  Alabama  in  Huntsville 
3US  Army  AMCOM,  Redstone  Arsenal,  Alabama,  USA 


ABSTRACT 

This  paper  outlines  a  design  tool  that  incorporates  environmental  and  quality  concerns  into  a  traditional 
simulation  environment.  The  Design  Tool  for  Assessing  Manufacturing  Environmental  Impact  (DTAME) 
allows  a  user  to  concurrently  consider  the  impact  of  environmental,  quality,  cost  and  production  attributes 
early  in  the  product  life  cycle.  DTAME  aids  in  making  environmentally  conscious  decisions  and  allows 
designers  to  understand  the  consequences  of  their  decisions  regarding  manufacturing  options. 


INTRODUCTION 

As  the  DoD  undergoes  reshaping  and  resizing  to  achieve  a  more  affordable  defense  capability,  it  is  also 
important  that  weapon  systems  be  developed  and  manufactured  in  an  environmentally  conscious  manner.  To 
achieve  this  goal  requires  that  a  change  occur  in  the  paradigm  that  is  used  to  view  the  weapon  system  life 
cycle.  In  today b  paradigm,  the  addressing  of  environmental  concerns  occurs  as  de  facto  activities  after  the 
product,  process,  and  manufacturing  plan  have  been  established.  Many  studies  have  shown  that  this  reactive 
approach  is  not  effective.  Currently,  the  DoD  and  its  contractors  are  in  the  process  of  Cleaning  up”  their 
facilities  and  weapon  system  designs  due  to  inadequate  environmental  planning. 

The  drive  for  improved  performance  at  a  lower  cost  often  results  in  the  use  of  new  and  emerging  materials 
and  manufacturing  processes.  Since  these  processes  and  materials  are  typically  not  well  understood, 
opportunities  for  applying  a  life-cycle  approach  to  the  detection  of  manufacturing  and  environmental 
problems  early  in  the  products  life  cycle  are  difficult.  For  instance,  many  designers  are  encouraged  to  utilize 
polymer-based  composite  materials  in  their  design  due  to  such  noted  properties  as  corrosion  resistance  and 
high  strength  to  weight  ratios  although  many  design  engineers  often  misunderstand  the  repercussions  of  the 
associated  composites  manufacturing  processes. 

Currently,  Program  Management  Office  personnel  do  not  have  the  expertise  to  address  the  environmental 
impacts  of  design  decisions  made  during  the  design  process.  This  paradigm  change  requires  that 
environmental  concerns  be  viewed  as  an  important  factor  in  the  trade-off  decision  making  that  must  occur 
during  the  early  development  phases.  Assessing  pollution  impacts  and  energy  consumption  during  early 
product  development  will  result  in  long-term  savings  and  a  significant  reduction  in  pollution  and  hazardous 
waste  generation  [1,2]. 

Discrete  event  simulation  provides  an  effective  tool  for  evaluating  system  configurations  and  new 
processing  strategies;  in  fact,  simulation  is  often  the  only  viable  choice  for  analysis  of  complex 
manufacturing  systems.  Advances  in  simulation  software  have  led  to  the  design  of  user-oriented  modular 
simulation  packages  thereby  decreasing  the  time  needed  to  build  a  simulation  model.  However,  these  tools 
have  not  directly  addressed  the  issues  of  simulation  in  the  context  of  environmental  concerns  and  multiple 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


158 


performance  metrics.  The  typical  tool  concentrates  on  providing  throughput  and  capacity  information  for 
assessing  the  impact  of  an  individual  station^  speed  and  reliability  on  overall  system  performance.  By 
combining  modular  simulation  software  techniques  and  life  cycle  design  with  the  domain  knowledge  of  a 
specific  manufacturing  technology,  a  simulation  system  can  be  provided  that  not  only  generates  traditional 
production  output  but  also  integrates  environmental,  quality,  and  cost  criteria  into  the  simulation. 

The  Design  Tool  for  Assessing  Manufacturing  Environmental  Impact  (DTAME)  was  developed  in  order  to 
merge  the  concepts  of  life  cycle  design  with  modular  simulation  techniques.  The  simulation  models  can  be 
used  to  generate  a  complete  material  balance  around  a  particular  manufacturing  process  and  can  be  made 
available  to  design  engineers  to  aid  in  life  cycle  design  assessment.  The  output  of  the  DTAME  simulation 
model  provides  design  engineers,  program  managers  and  integrated  product  teams  with  information  on 
energy  usage  and  a  complete  material  balance.  This  information  includes  the  types  and  quantities  of  scrap 
and  hazardous  material  produced,  as  well  as  the  more  traditional  system  analysis  results  including  queue 
lengths  and  production  lead  times.  The  goal  is  to  develop  the  tools  that  allow  users  to  make  decisions  with 
a  high  degree  of  confidence  about  whether  a  proposed  system  meets  design  constraints  and  is 
environmentally  and  economically  preferable. 

The  targeted  domain  of  the  DTAME  system  is  polymer-based  composite  materials  and  their  associated 
processes.  The  basic  steps  for  fabricating  composite  material  parts  are:  begin  with  a  plastic  matrix  and 
reinforcement,  co-mingle  the  matrix  and  reinforcement,  form  the  co-mingled  composite  into  the  part 
geometry,  cure  or  heat  the  composite,  and  finally,  perform  any  required  finishing  or  joining  operations. 
While  typical  finished  composite  parts  are  chemically  benign  and  pose  little  ecological  threat, 
environmental  concerns  and  issues  arise  in  the  basic  manufacturing  steps  [3].  Environmental  regulatory  acts 
that  establish  materials  that  constitute  hazardous  waste,  that  regulate  treatment  and  disposal,  and  that 
establish  reporting  requirements  for  chemical  release,  waste  reduction,  recycling  and  energy  recovery, 
impact  on  the  manufacture  of  composite  parts.  Ignorance  of  these  regulations  can  lead  to  severe  penalties. 
Hence,  composite  manufacturers  are  sensitive  to  these  regulations  and  must  be  apprised  of  revisions. 


DTAME  SYSTEM  ARCHITECTURE 

Figure  1  illustrates  the  architecture  of  the  DTAME  system.  It  is  composed  of  three  major  modules,  in 
addition  to  the  supporting  modules  for  communication  and  integration.  The  analysis  of  a  proposed  system 
begins  with  the  CDMCS,  a  critiquing  system  designed  to  provide  high  level  expertise  in  determining 
whether  a  particular  process  is  appropriate  for  the  specified  design.  The  next  stage  is  the  simulation 
environment,  called  SimBuilder,  where  information  regarding  environmental  and  production  parameters  is 
collected  and  exercised  through  discrete  event  simulation.  Finally,  the  simulation  output  is  submitted  for 
optimization.  The  interface  is  designed  to  facilitate  the  entry  of  information  into  the  DTAME  system.  This 
is  accomplished  by  providing  the  user  with  a  set  of  specially  designed  input  screens  and  menu  selections 
that  are  tailored  for  the  manufacturing  process  being  evaluated. 

CDMCS 

The  Composite  Design  and  Manufacturing  Critiquing  System  (CDMCS),  assists  a  design  engineer  by 
critiquing  the  proposed  manufacturing  methods  for  a  particular  design  [4],  The  critique  is  intended  to  aid 
the  decision  by  providing  evaluative  information  based  on  domain  knowledge.  The  critique  provided  by 
CDMCS  is  built  by  comparing  the  design  parameters  for  a  specified  part  against  a  set  of  design  rules  and 
parametric  relationships  that  govern  the  acceptability  of  individual  composite  part  manufacturing  processes. 
The  rules  and  metrics  that  qualitatively  simulate  an  experts  knowledge  are  divided  into  three  categories: 
requisite  metrics,  core  metrics  and  enabling  metrics.  Requisite  metrics  are  Boolean  in  nature  and  must  be 
satisfied.  Core  metrics  must  be  satisfied  to  a  high  degree.  Finally,  enabling  metrics  are  those  metrics  that 
are  not  vital  to  the  acceptability  of  a  candidate  process,  but  enhance  or  detract  from  its  desirability.  An 
aggregate  score  for  the  candidate  process  is  obtained  by  analyzing  each  of  the  metrics  that  apply  to  a 
specified  candidate  process.  The  success  values  for  the  metrics  are  compiled  into  an  overall  score.  The 
aggregate  score  for  the  process  is  then  mapped  to  a  qualitative  rating  from  very  poor  to  highly  acceptable. 
The  qualitative  rating  is  used  to  produce  both  explanatory  text  and  graphical  representations.  The  system 
also  provides  suggestions  for  improving  the  aggregate  score. 


159 


Fig.  1.  DTAME  Architecture 


The  use  of  a  critiquing  system  is  a  central  feature  of  DTAME  as  it  provides  guidance  about  the  evolving 
simulation  definition  and  allows  a  user  to  choose  an  appropriate  manufacturing  process  to  meet  the  design 
specifications.  A  graphical  user  interface  allows  the  user  to  operate  the  system  and  observe  results. 
Network  links  connect  the  system  to  remote  databases  that  allow  for  the  most  current  selections  of  materials 
and  other  parameter  values. 

SimBuilder 

Whether  maintaining  and  improving  existing  production  lines  or  designing  new  lines,  simulations  are 
employed  to  evaluate  and  compare  alternatives.  Simulation  is  often  the  only  viable  choice  for  analysis  of 
complex  manufacturing  systems  especially  where  there  is  a  high  degree  of  interdependence  between  design, 
process  equipment,  and  process  control.  Simulation  provides  an  effective  tool  for  evaluating  system 
configurations  and  new  processing  strategies.  If  properly  constructed  and  maintained,  a  simulation  model  of 
an  existing  production  line  can  be  used  to: 

•evaluate  the  impact  of  product  mix  changes, 

•evaluate  the  impact  an  individual  station^  speed  and  reliability  on  overall  system  performance, 

•compare  the  system  throughput  and  capacity  with  different  process  configurations, 

•provide  environmental  data  that  can  be  integrated  with  process  cost  information  to  develop  more  detailed 
and  accurate  models  of  processing,  and 

•compare  the  performance  of  different  system  configurations  required  by  competing  design  technologies. 

The  SimBuilder  module  is  used  to  rapidly  define,  model,  and  evaluate  a  proposed  manufacturing  system 
once  a  manufacturing  process  has  been  specified  in  CDMCS.  This  tool  supports  rapid  prototyping  and 
concurrent  engineering  by  creating  a  modeling  environment  that  improves  the  clarity  of  the  model, 
increases  productivity,  reduces  the  modeler^  need  to  know  the  details  of  a  simulation  language,  and 
provides  for  easier  maintenance  and  improved  documentation.  The  SimBuilder  simulation  system  is 
composed  of  three  subsystems,  the  input  system,  the  simulation,  and  the  output  system. 

The  purpose  of  the  input  system  is  to  ensure  that  all  required  information  is  obtained  from  the  user  and 
available  for  use  within  the  system.  Icons,  which  represent  the  various  equipment  elements,  can  be  placed 
anywhere  in  a  two-dimensional  graphical  workspace.  The  user  interactively  responds  to  questions  about  the 
process  and  environmental  factors.  Certain  databases  were  developed  to  store  information,  which  were  not 
directly  used  to  populate  the  simulation  but  were  needed  for  the  output  analysis.  Most  importantly,  the 
material  database,  consists  of  1)  a  general  section  which  contains  general  information  about  the  materials 
being  used  including  the  name,  cost,  and  Material  Safety  Data  Sheet  (MSDS)  environmental  information, 
and  2)  the  option  specific  section  which  includes  information  required  for  each  simulation  experiment. 
This  includes,  for  example,  the  amount  of  material  used  for  the  original  part  and  the  percentage  of  the  part 
discarded  during  certain  operations.  All  of  the  information  required  for  the  material  database  is  captured 
from  the  interface  questions  developed  for  the  submodels. 


160 


The  second  subsystem  is  the  simulation.  The  simulation  subsystem  uses  information  obtained  from  the  user 
to  develop  and  run  the  simulation  model  and  to  generate  information  needed  for  the  output  subsystem.  This 
includes  developing  submodels  for  all  process  steps  and  creating  system  variables,  which  track  the  required 
information.  Once  the  user  has  completed  the  data  input  and  is  satisfied  with  the  model,  the  simulation  code 
is  generated  and  executed  using  the  WITNESS  [Lanner  Group]  simulation  environment.  SimBuilder  builds 
the  simulation  code  from  a  library  of  submodels.  The  simulation  can  now  be  optimized  using  a  Genetic 
Algorithm  (GA)  approach  as  presented  below  in  the  discussion  on  the  Optimization  module. 


The  output  subsystem  utilizes  information  from  the  user  and  the  simulation  to  generate  material, 
environmental,  quality,  cost,  production,  and  energy  reports.  Six  basic  output  reports  were  developed  to 
effectively  communicate  information  to  the  user.  The  reports  developed  were:  1)  the  material  report  which 
includes  the  amount  of  each  material  that  enters  the  system,  exits  the  system  in  each  output  stream  (good 
parts,  mandrel  prep  scrap,  etc.)  and  remains  in  the  system  as  work  in  process,  2)  the  process  report  (from 
WITNESS)  which  includes  traditional  simulation  output,  3)  the  quality  report  which  includes  the  amount 
and  cost  of  all  scrapped  materials  by  processing  station  and  type  of  material,  4)  the  environmental  report 
which  includes  the  amount  and  cost  of  all  materials  discarded  as  waste,  5)  the  energy  report  which  includes 
the  amount  and  cost  of  the  energy  used  to  produce  the  parts,  and  6)  the  cost  report  which  includes  material, 
energy,  and  labor  costs  to  produce  the  parts.  Fig.  2.  provides  a  sample  output  report  on  environmental 
criteria.  This  report  lists  both  material  waste  and  scrap  and  provides  an  average  cost  per  finished  part  for 
each  category.  In  particular,  amounts  of  hazardous  waste  can  be  easily  tracked  and  documented. 


!  Environmental  Renort 

Wt/Finished  part 

Wt/Week 

i  Cost/Finished  part 

Waste  Type 

Avg 

Avg 

Avg 

Discarded  RM's 

0.98 

18.04 

Total  Amount  of  Mat'l  in  Scrap 

5.99 

164.31 

$5.56 

Machine  Waste 

0.00 

$0.00 

Cuttings 

0.00 

Waste  Resin 

0.72 

$0.72 

Solvent  Bottoms 

1.92 

$1.92 

Air 

2.40 

65.71 

$2.40 

Discarded  Mat'ls  used  for  good  parts  or 

2.58 

HHBESiiBHH 

Mandrel  Mat'ls 

1.00 

Bag  Mat'ls 

0.42 

11.54 

$0.01 

Mold  Release 

1.00 

27.52 

$1.00 

Mandrels 

0.15 

4.04 

$0.00 

Total 

14.58 

■EHTiini 

$13.12 

■  ■ 

Environmental  Categories 

Waste/Finished  Part 

Waste/Week 

Usage/Finished  Part 

Usage/Week 

y/5  im.fi  i  fi  i  i 

0.00 

0.00 

0.00 

i  — '  niii|i 

0.00 

0.00 

0.00 

Toxic  Chemical 

0.00 

0.00 

0.00 

TRI  Chemical 

0.00 

0.00 

SARA  H-l 

0.34 

9.23 

11.15 

305.68 

SARA  H-2 

0.00 

0.00 

0.00 

0.00 

SARA  P-3 

0.00 

0.00 

0.00 

0.00 

SARA  P-4 

0.00 

0.00 

0.00 

0.00 

SARA  P-5 

0.00 

0.00 

0.00 

Hazardous 

Non-hazardous 

Fig.  2.  Environmental  Report 


Once  a  production  line  has  been  defined  and  before  committing  to  a  simulation  model,  the  user  is  given  the 
option  of  performing  a  static  analysis.  A  static  analysis  is  a  high  level  quantitative  analysis  based  on 
simplifying  assumptions  that  quickly  produces  results  for  the  following  information  on  each  of  the 
manufacturing  operations:  Production  Cycle  Time,  Percent  Down  time,  Effective  Cycle  Time,  and  Shift 
Capacity.  The  static  analysis  of  the  manufacturing  system  can  be  generated  using  the  processing  times  for 
each  component  to  generate  a  maximum  throughput  for  the  line  and  identify  bottlenecks. 


161 


Optimization 

The  third  module,  Optimization,  is  used  to  improve  the  manufacturing  system  configuration.  Several 
studies  have  shown  that  optimization,  in  conjunction  with  simulation,  is  the  best  way  to  obtain  the 
maximum  amount  of  information  while  minimizing  the  amount  of  resources  utilized  [5],  A  Genetic 
Algorithm  approach  has  been  used  in  conjunction  with  the  WITNESS  simulation  model.  The  optimization 
component  takes  as  input  a  WITNESS  simulation  model.  The  genetic  algorithm  adjusts  the  down  times  (the 
frequency  of  a  machine  being  taken  off-line  for  maintenance  or  breakdown),  number  of  machines,  and 
cycle  times  in  the  WITNESS  model  in  order  to  achieve  better  results  over  time.  The  genetic  algorithm  takes 
the  results  of  the  simulation  run  and  evaluates  the  results  in  light  of  a  predefined  fitness  function.  The  user 
can  modify  the  fitness  function  associated  with  the  algorithm;  one  example  is  a  weighted  average  of  profit, 
average  flow,  and  average  work  in  process.  Once  WITNESS  receives  the  necessary  data  from  the  genetic 
algorithm,  it  runs  a  simulation  of  the  manufacturing  line  for  a  pre-determined  amount  of  simulation  time 
(including  a  warm-up  period). 


CONCLUSIONS 

The  DTAME  has  been  developed  to  aid  designers  in  evaluating  the  incorporation  of  composite  materials  in 
their  design.  DTAME  accomplishes  this  goal  by  consolidating  material  and  process  knowledge  in  a 
simulation  environment  that  integrates  environmental,  cost,  quality  and  production  factors. 

The  primary  modules  of  DTAME  provide  for  a  consistent  methodology  while  allowing  continuous 
improvement  of  the  tools  that  implement  it.  The  critiquing  module,  CDMCS,  assists  the  user  in  determining 
the  viability  of  a  manufacturing  process  for  a  particular  design.  The  simulation  module,  SimBuilder,  models  a 
wide  variety  of  manufacturing  options  through  the  incorporation  of  reusable  simulation  submodels. 
Furthermore,  the  simulation  integrates  material  usage,  environmental  impact  and  direct  costs  in  addition  to 
production  criteria.  The  user  can  select  from  a  variety  of  post  simulation  data  displays  and  analysis  options 
allowing  quality,  environmental,  cost  and  production  reports  for  use  by  the  design  engineer  when  considering 
manufacturing  alternatives.  Once  the  simulation  models  are  created  in  the  SimBuilder  system,  they  serve  as 
input  to  the  genetic  algorithm  module.  This  module  attempts  to  generate  an  optimal  solution  from  each 
generation  can  then  be  simulated  to  provide  a  more  detailed  evaluation. 

Research  is  currently  being  conducted  on  data,  information  and  knowledge  version  and  case  control.  The  goal 
of  this  effort  is  to  provide  a  consistent  environment  for  archiving,  searching  and  tracking  the  decision  process 
for  particular  cases.  Ideally  this  evolving  enterprise  knowledge  base  will  be  used  both  to  refine  the  knowledge 
incorporated  into  the  modules  and  to  provide  additional  information  that  will  allow  the  user  to  make  more 
informed  and  better  decisions. 


REFERENCES 

Keoleian,  G.A.  and  Menerey,  D.  (1993)  Life  Cycle  Design  Guidance  Manual.  Ohio  Environmental 
Protection  Agency  (EPA  600/R-92/226). 

U.S.  Environmental  Protection  Agency.  Guide  to  Pollution  Prevention,  (1991).  The  Fiberglass-Reinforced 
and  Composite  Plastics  Industry  (EPA/625/7-91/015). 

Russell,  D.  (1997).  Methodology  for  Designing  Modular  Multi-Criteria  Discrete  Event  Simulations. 
Dissertation.  The  University  of  Alabama  in  Huntsville. 

Messimer,  S.,  Henshaw,  J.,  Montgomery,  J.,  and  Rogers,  J.  (1996)  Composites  Design  and  Manufacturing 
Critiquing  Assistant.  Artificial  Intelligence  in  Engineering  Design  and  Manufacturing  1 0,  65-69. 

Hall  J.  and  Bowden,  R.  (1996).  Simulation  Optimization  for  a  Manufacturing  Problem.  Proceedings  of  the 
1996  Southeastern  Simulation  Conference,  135-140. 


162 


163 


Models,  Algorithms  and  Decision  Support  Systems 
for  Letter  Mail  Logistics 

Hans-Jiirgen  Sebastian 

RWTH  Aachen  ,  Operations  Research  Group,  Aachen,  Germany 

ABSTRACT 

The  reorganization  of  the  Deutsche  Post  AG  imposed  massive  structural  and  organizational  changes.  These 
changes  strongly  influence  the  design  and  operations  of  the  logistic  network.  Here  we  will  focus  on  the  so- 
called  main  transportation  network..  It  consists  of  the  network  connecting  83  letter  mail  centers  distributed 
throughout  Germany  as  well  as  the  international  letter  mail  center  at  the  Frankfurt  am  Main  airport.  The 
planners  have  to  decide  how  an  average  of  about  1,500  tons  of  letter  mail  is  transported  between  the  letter 
mail  centers  each  night.  Moreover  the  system  should  be  able  to  deal  with  special  situations,  such  as  the 
strong  quantity  increases  before  Christmas. 

The  Deutsche  Post  AG  and  the  ELITE  Foundation  are  implementing  a  Decision  Support  System  for  this 
specific  planning  task  in  cooperation  with  the  RWTH  Aachen.  The  system  is  currently  running  at  the 
Deutsche  Post  AG's  branch  in  Bonn  and  is  simultaneously  being  extended  and  improved.  It  is  based  on  a 
client-server  architecture,  including  a  Geographical  Information  System  (GIS),  a  relational  data-base 
system,  a  Graphical  User  Interface  (GUI),  and  a  number  of  optimization  algorithms  for  different  planning 
tasks. 

In  order  to  reduce  the  complexity  of  the  planning  problem,  it  was  decided  to  divide  the  problem  into  sub¬ 
problems,  which  are  solved  sequentially.  There  is,  of  course,  always  the  possibility  to  backtrack  to  an 
earlier  planning  stage,  if  necessary.  The  sub-problems  include  the  night  airmail  network,  the  ground¬ 
feeding  transportation  design,  the  road  network  and  hub  design  problem,  hub  vehicle  scheduling,  and  the 
direct  loading  vehicle  scheduling  problem. 

The  process  of  letter  mail  collection  can  be  roughly  divided  into  five  sub-processes:  first,  the  letter  mail  is 
collected  from  the  mailboxes  at  each  letter  mail  center.  It  is  then  sorted  according  to  the  destination  letter 
mail  center.  This  process  finishes  at  about  2115  hours.  In  the  third  step,  the  letter  mail  is  transported 
between  the  letter  mail  centers.  This  process  has  to  be  finished  no  later  than  415  the  next  morning,  resulting 
in  a  transportation  time  window  of  about  7  hours.  However,  because  of  sorting  capacities,  it  must  be 
guaranteed  that  the  letter  mail  arrives  almost  continuously  during  the  7-hour  period,  effectively  making  the 
problem  an  inventory-routing  design  problem. 

The  incoming  letters  are  sorted  in  correspondence  to  their  local  destination  at  every  letter  mail  center. 
Eventually,  the  letter  mail  is  transported  to  pick-up  points  where  it  is  collected  by  the  postman.  The  tight 
time  window  constraint  forces  a  fraction  of  about  20%  of  the  letter  mail  to  be  transported  using  the  so- 
called  night  airmail  network.  However,  it  might  be  better  to  increase  this  fraction  in  order  to  save  road 
transportation  costs.  The  assignment  of  letter  mail  to  either  the  night  airmail  network  or  the  road  network  is 
in  itself  an  optimization  problem,  which  can  only  be  solved  by  consideration  of  the  entire  transportation 
network. 

For  a  given  quantity  of  letter  mail,  optimization  of  the  night  airmail  network  consists  of  assigning  the  letter 
mail  for  each  origin-destination  pair  to  a  flight.  A  flight  is  defined  by  its  take-off  and  landing  airport,  the 
take-off  time,  and  the  type  of  aircraft.  It  can  be  shown  that  only  a  limited  number  of  take-off  times  have  to 
be  considered,  thus  reducing  the  number  of  possible  flights.  This  observation  leads  to  a  model  that  is 
similar  to  a  warehouse-location  (or  facility-location,  plant-location)  model.  However,  its  size  (several 
thousands  of  warehouses  and  customers)  requires  heuristic  methods  to  be  used  effectively.  We  have 
developed  a  Tabu  Search  algorithm,  which  iteratively  calls  the  commercial  mixed-integer  solver  CPLEX. 
The  results  indicate  strong  potential  for  savings  compared  to  the  existing  solution. 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


164 


Ground-feeding  of  the  night  airmail  network  is  performed  by  vehicles  of  different  size  and  speed.  The 
objective  is  to  design  vehicle  trips  from  the  letter  mail  centers  to  the  airport  and  back,  so  that  on-time 
delivery  of  letter  mail  at  the  airport  is  guaranteed. 

There  is  the  possibility  of  by-passing  other  letter  mail  centers  on  the  way  to  the  airport  and/or  picking-up 
additional  mail  in  order  to  save  transportation  costs.  The  number  of  additional  pick-ups  is  limited  however, 
to  two  or  three  due  to  the  time-window  constraints. 

We  have  chosen  to  generate  a  large  number  of  tours  and  to  model  the  problem  as  a  set-covering  problem. 
The  model  is  solved  by  a  Lagrangean  heuristic  for  set-covering.  Most  letter  mail  quantities  between  the 
different  letter  mail  centers  are  rather  small.  Therefore,  it  is  desirable  to  consolidate  the  letter  mail  into  a 
hub  system.  On  the  other  hand,  hub  consolidation  is  time-consuming  and  rather  costly  due  to  sorting  costs  - 
especially  since  the  sorting  process  is  performed  manually.  Due  to  the  requirement  of  almost  continuous 
input,  it  is  only  feasible  to  delay  a  fraction  of  the  mail  by  hub  consolidation.  Moreover,  process  feasibility 
can  only  be  checked  if  the  entire  system  is  considered  simultaneously.  We  have  developed  an  Evolutionary 
Algorithm  which  performs  this  optimization,  i.e.,  whether  to  transport  by  hub  consolidation  or  direct  loads, 
based  on  modification  of  vehicle  schedules.  Letter  mail  is  then  re-assigned  to  the  modified  schedules  in  the 
algorithm.  The  algorithm  is  able  to  improve  initial  solutions  considerably. 

The  hub  and  direct  loading  vehicle  scheduling  problems  are  similar  to  the  ground-feeding  problems  and  can 
again  be  solved  by  a  set-covering  approach.  However,  due  to  the  size  of  the  problem,  heuristic 
preprocessing  methods  have  to  be  employed. 

The  Decision  Support  System  also  contains  a  large  number  of  methods  for  cost  and  resource  analysis, 
which  support  manual  modification  of  the  computer-generated  plans.  These  methods  have  increased  the 
user-acceptance  of  the  system  and  enable  users  to  gain  an  impression  of  the  system-wide  consequences  of 
their  decisions. 


165 


Intelligent  Processes  for  Production  Control 

Edson  Pacheco  Paladini 

Universidade  Federal  de  Santa  Catarina 
EPS  /  CTC  -  CP  476  -  Trindade 
88040-970  ,  Florianopolis,  SC,  Brazil 


ABSTRACT 

This  paper  presents  an  intelligent  system  which  makes  possible  to  control  raw  material  flow  (a  typical 
production  control  problem)  automatically.  The  system  is  supported  by  Artificial  Intelligence  devices,  more 
specifically  by  two  Expert  Systems  which  determine  on  a  case-by-case  basis  whether  raw  material  must  be 
allowed  straight  into  the  factory  without  inspection  or,  otherwise,  it  must  go  through  any  kind  of  inspection. 
Should  the  material  be  inspected,  it  must  be  sent  either  to  a  reposition  department,  where  perfect  pieces  will 
replace  defective  ones,  or  to  a  general  lot  analysis  department,  where  in  case  of  lot  rejection,  the  material 
must  be  re-inspected  or  returned  to  suppliers.  An  experimental  use  of  this  system  is  reported  and  the 
preliminary  results  are  analyzed. 


INTRODUCTION 

This  paper  presents  a  system  applied  to  wall  tile  factories  and  deals  with  an  automatic  process  for  raw 
material  flow  organization  in  those  companies.  Due  to  its  nature  and  generality,  however,  this  study  can  be 
extended  to  other  companies,  since  the  problems  analyzed  here  are  rather  frequently  observed. 

In  general  terms,  this  paper  refers  to  an  automatic  system  of  decisions  about  the  factory’s  incoming  raw 
material.  Two  areas  are  structured  and  in  each  of  them  automatic  decisions  are  made  about  the  flow  of  raw 
material  arriving  at  the  factory.  Expert  Systems  are  used  as  basic  decision  modules. 


GENERAL  INSPECTION  PROCEDURES  OF  RAW  MATERIAL 

Organization  of  the  raw  material  reception  area  in  a  factory  is  usually  divided  into  two  areas.  In  the  first 
one,  all  of  the  raw  material  arriving  at  the  factory  undergoes  a  preliminary  analysis  called  Inspection 
Control.  Here  one  decides  whether  the  raw  material  ought  to  be  inspected  or  not.  Raw  material  not 
requiring  inspection  is  allowed  straight  into  the  factory.  Raw  material  which  requires  inspection  follow  on 
to  the  second  area.  In  this  second  area  called  Inspection  Selection,  one  decides  if  the  lot  must  be  inspected, 
so  that  defective  pieces  are  replaced  by  perfect  ones  ( rectifying  inspection ),  or  if  the  lot  analysis  is  to  decide 
only  whether  the  lot  will  be  released  for  use  or  returned  to  its  supplier  ( inspection  for  acceptance). 

For  the  purpose  of  this  study,  quality  inspection  is  regarded  as  the  process  aimed  at  determining  whether  a 
given  piece,  sample  or  lot  complies  with  pre-established  quality  specifications  [1].  Thus,  inspection 
evaluates  the  quality  level  of  a  certain  part  or  a  set  of  pieces,  comparing  each  piece  or  each  piece  with  a  pre¬ 
determined  standard.  The  inspection  aims  essentially  at  providing  a  diagnosis  of  the  product  in  terms  of  its 
quality  level  [2].  Such  a  diagnosis  is  always  centered  upon  the  quality  characteristic,  which  consists  of  each 
and  every  elementary  property  that  the  product  must  possess  in  order  allow  it  to  work  at  full  compliance 
with  its  project  as  well  as  with  the  function  it  was  designed  to  perform. 

It  can  be  seen  that  in  both  areas  there  are  decisions  related  to  raw  materials  flow.  In  the  first  case,  such 
decisions  have  to  do  with  sending  the  raw  material  straight  to  the  assembly  line  or  to  an  inspection  process. 
In  the  second  case,  these  decisions  involve  (1)  allowing  the  lot  in  the  assembly  line,  now  as  the  result  of  an 
inspection  process,  or  (2)  returning  it  to  the  supplier.  For  each  case,  we  have  developed  and  applied  an 
Expert  System  to  make  the  decision  required. 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


166 


INSPECTION  CONTROL  AREA 

The  basic  decision  in  the  first  raw  material  reception  area  involves  the  establishment  (or  not)  of  the 
necessity  of  a  given  raw  material  that  has  just  arrived  to  be  inspected.  In  order  to  make  decisions  in  this 
area;  a  Decision  Support  Expert  System  was  developed  which  determines  the  most  suitable  choice  as 
regards  whether  or  not  the  development  of  inspection  procedures  for  materials  received  is  in  fact  necessary. 

The  Expert  System  in  question  makes  use  of  a  basic  study  previously  developed  to  determine  if  inspection 
is  really  justifiable  for  some  pieces  or  specific  situations.  This  question  stems,  first  and  foremost,  from  what 
the  inspection  is  intended  to  -  fundamentally,  it  provides  a  diagnosis  of  the  process,  detecting  defects, 
identifying  situations  of  non-compliance,  analyzing  cases  of  non-fulfillment  of  basic  functioning  requisites 
and  also  carrying  out  particular  evaluations  of  the  product^  quality  characteristics  along  its  different 
manufacturing  stages. 

The  concept  underlying  the  Expert  System  is  simple.  In  general  terms,  inspection  is  deemed  justifiable  if  fits 
within  a  broader  process,  being  thus  seen  as  a  simple  support  activity.  Rendering  it  adequate  to  control 
strategies  or  to  the  process  evaluation  methodologies  will  then  be  essential  to  determine  whether  it  must  be 
carried  out  or  not. 

Carrying  out  an  inspection  is  justifiable  only  after  the  criterion  exposed  above  has  been  attended  to,  e.g., 
that  the  inspection  fits  within  a  broad  quality  evaluation  process,  so  that  its  results  can  be  analyzed  and 
taken  into  consideration  when  the  general  actions  of  the  Quality  System  are  defined. 

The  objectives  of  the  inspection  ought  to  be  simultaneously  considered  with  this  general  criterion.  If  what 
we  seek  is  only  suppliers’  quality  evaluation,  inspection  may  not  be  the  most  appropriate  means  of 
obtaining  such  information,  since  it  provides  more  specific  considerations  and  emphasizes  particular  aspects 
of  pieces.  However,  a  whole  group  of  inspections  duly  put  together  and  analyzed  could  serve  that  purpose  - 
which  would  not  be  true  for  individual  inspections. 

Together  with  these  broad  guidelines,  sometimes  rather  generic,  other  more  specific  aspects  can  be  taken 
into  account.  Such  considerations,  in  complete  accord  with  the  general  criteria  described  above,  show  well- 
characterized  practical  situations,  although  likely  to  be  found  in  a  large  number  of  products  and  processes  in 
which  inspection  is  highly  recommended  and  others  where  it  simply  is  not  reasonable  to  carry  it  out. 

Inspection  costs  in  view  of  the  importance  of  a  given  piece  is  one  of  such  aspects.  If  inspection  cost  is  too 
high,  inspection  is  not  justifiable.  In  this  case,  control  could  be  carried  out  by  some  activity  subsequent  in 
the  productive  process  or  by  testing  a  given  set  including  the  piece  in  question.  A  combined  analysis  would 
thus  compensate  high  inspection  costs  of  individual  items. 

A  related  aspect  has  to  do  with  the  cost  of  the  unsatisfactory  product.  If  this  cost  is  exaggerated,  inspection 
should  be  carried  out.  Otherwise,  it  should  probably  not.  Whenever  the  raw  material  immediate  use 
phases  involve  covering  operations  or  alterations  on  face  or  external  features  of  the  piece,  inspection  is 
justified.  In  more  general  terms,  if  the  following  operation  in  the  process  is  of  extreme  importance  for  the 
product,  inspection  is  required. 

There  are  cases  where  inspection  is  necessary  to  perform  essential  tasks  related  to  raw  material  analysis. 
This  happens  when  inspection  is  used  for  classifying  pieces,  for  instance.  The  same  situation  takes  place  if 
the  product  has  many  characteristics  to  be  controlled.  On  the  other  hand,  inspection  ceases  to  be  relevant  if 
rejection  of  the  product  does  not  interfere  with  the  disposition  of  using  it.  In  this  case,  testing  the  product  is 
not  justifiable  if  such  an  evaluation  results  in  no  change  on  its  effective  utilization.  It  is  practically  the  same 
as  making  no  use  of  the  inspection  results.  If  such  results  are  not  taken  into  consideration,  there  is  no  reason 
for  us  to  get  them  and,  hence,  no  reason  for  us  to  carry  out  the  inspection. 

The  inspection  can  still  be  considered  within  the  context  of  the  productive  process  as  a  whole.  We  may 
choose  not  to  proceed  with  an  inspection  where  the  supplier^  history  shows  a  high  performance  or  where 


167 


techniques  of  Statistical  Control  of  Processes  determine  that  the  productive  process  is  under  control,  having 
full  compliance  with  the  specifications  of  the  project  [3].  In  such  cases,  if  the  capability  value  of  the  process 
is  reliable  and  meets  the  specifications  of  the  piece,  inspection  may,  at  least,  be  mitigated.  However,  if  the 
evaluation  of  a  supplier’s  previous  data  reveals  a  proneness  to  produce  defects  that  become  more  serious  in 
the  following  phases  or  simply  propagate  along  them,  inspection  is  then  recommended. 

In  view  of  the  specificity  observed,  we  have  detected  the  need  of  designing  a  Decision  Support  Expert 
System  which  makes  possible  to  determine  the  best  option  to  be  adopted  in  a  given  situation,  where  it 
becomes  necessary  to  decide  effectively  whether  or  not  an  inspection  should  be  carried  out.  This  is  the  aim 
of  the  present  system,  which  compares  the  benefits  and  restraints  of  carrying  out  an  inspection  at  this  point 
in  the  process  and  defines  the  posture  to  be  adopted.  It  is  noteworthy  the  fact  that  in  other  areas  of  Quality 
Management,  Expert  Systems  have  been  used  successfully  [4,5]. 


FLOW  CONTROL  SYSTEM  1 

This  is  an  Expert  System  based  on  rules,  having  66  rules  and  30  qualifiers.  The  system  can  list  all  the 
qualifiers  as  well  as  the  rules  in  which  the  choices  were  used.  In  this  case,  the  choices  appear  in  all  the  rules 
used  for  the  decision.  The  decision  of  the  Expert  System  has  to  do  with  carrying  out  or  not  the  inspection. 
The  scale  of  values  used  by  the  system  is  made  up  of  (integer)  values  ranging  from  0  to  10.  The  adequacy  of 
the  option  chosen  is  made  evident  when  values  close  to  1 0  are  given  to  it;  its  inadequacy  is  characterized  by 
values  close  to  0.  All  the  rules  possible  are  deployed  in  deriving  data  for  the  selection  of  the  most  suitable 
choice.  The  system  does  not  show  the  rules  while  they  are  being  used.  Notwithstanding,  the  user  may  alter 
this  option.  As  an  example  of  a  rule  we  have: 

(Rule  24) 

IF  rejection  of  a  product  precludes  its  use, 

THEN  Inspection  should  be  carried  out  -  Probability:  9/1 0. 

THEN  Inspection  should  not  be  carried  out  -  Probability:  1/10 

As  an  example  of  a  qualifier  we  have: 

(Qualifier  21) 

The  immediate  phase  of  use  of  raw  material 

(1 )  is  costly  because  it  uses  expensive  materials; 

(2)  is  irreversible; 

(3)  implies  high  execution  costs; 

(4)  does  not  have  special  characteristics 

Most  of  the  rules  have  bibliographical  references,  providing  them  with  a  conceptual  background.  Some 
rules  also  have  explanatory  notes  as  to  their  formulation  or  concepts  therein  included.  The  system  is  made 
up  of  5  basic  areas  involving  analyses  related  to  the  nature  of  the  inspection,  of  the  product,  of  the  process 
and  of  the  lots,  as  well  as  a  quality  level  analysis  of  the  process.  In  broad  terms,  each  area  involves  the 
following  aspects,  amongst  others:  (a)  As  to  the  nature  of  the  inspection:  inspection  cost  levels  in  view  of 
the  importance  of  the  piece;  inspection  efficacy  level;  general  objectives  of  the  inspection;  nature  of  the 
tests  for  carrying  out  the  inspection;  effects  of  the  inspection  on  specific  phases  of  the  process;  defect 
occurrence  possibility;  necessity  or  convenience  of  classifying  the  pieces;  (b)  as  to  the  nature  of  the  product: 
characteristics  of  the  product  to  be  controlled;  consequences  of  rejecting  a  defective  product;  cost  of 
products  non-compliant  with  the  project;  relation  between  defect  occurrence  and  manufacturing  phases  of 
the  product  (e.g.  probability). 


INSPECTION  SELECTION  AREA 

The  raw  materials  which  Area  1  Expert  System  released  will  be  forwarded  straight  to  the  assembly  lines 
without  inspection.  The  others  will  be  submitted  to  a  new  Decision  Support  Expert  System  which 
determines  the  most  suitable  choice  in  the  case  of  a  decision  between  quality  inspection  only  for  acceptance 
(or  rejection)  of  raw  material  lots,  and  quality  inspection  for  lot  rectification. 


168 


It  is  worth  pointing  out  that  the  decision  here  involves  the  purpose  of  the  inspection,  i.e.,  it  can  be  sorted  out 
into  two  types:  lot  inspection  exclusively  for  acceptance  (or  rejection)  and  inspection  for  correction  for 
upgrading  the  quality  level  of  a  given  lot,  therefore  altering  its  value.  The  first  case  consists  of  inspection 
for  acceptance  -  inspection  is  aimed  only  at  detecting  defective  pieces  in  a  lot  to  determine  whether  the  lot 
should  be  accepted  in  its  completeness  or  rejected,  considering  thereto  maximum  values  of  those  defective 
pieces.  Thus,  this  type  of  inspection  is  limited  to  accepting  or  rejecting  the  lot  based  on  the  analysis  of  a 
sample  taken  from  it.  Acceptance  implies  releasing  the  lot  for  use;  rejection  means  that  it  should  be  returned 
to  the  supplier.  This  type  of  inspection  is  called  Inspection  for  acceptance’  since  it  consists  only  of  an 
evaluation  in  order  to  determine  what  to  do  with  the  lot  —  accept  it  (which  means  its  habilitation  for 
effective  use  in  the  factory)  or  reject  it  (which  means  returning  it  to  its  origin,  i.e.,  returning  the  lot  to  the 
supplier). 

The  second  type  involves  rectifying  inspection.  If  we  do  not  want  to  return  the  whole  lot,  we  may  carry  out 
an  inspection  aiming  at  replacing  defective  pieces  by  perfect  ones.  In  this  case,  we  work  on  a  sample  of  the 
lot  initially.  Each  defective  piece  found  in  the  sample  is  replaced  by  a  perfect  piece.  If  the  number  of 
defective  pieces  is  lower  than  a  given  limit,  the  lot  is  then  accepted  and  released  for  use.  Here,  only  those 
defective  pieces  from  the  sample  were  replaced.  If,  however,  the  number  of  defective  pieces  should  exceed 
of  a  pre-established  limit,  then  the  whole  lot  will  be  inspected  with  replacement  of  all  the  defective  pieces 
by  perfect  ones.  This  is  what  we  call  rectifying  inspection. 

There  is  a  fundamental  difference  between  these  two  types  of  inspection.  Inspection  for  acceptance 
determines  the  quality  level  of  the  lot,  but  it  does  not  go  any  further  than  that,  whereas  rectifying  inspection, 
in  addition  to  determining  the  quality  level,  makes  it  better  by  means  of  replacement  of  defective  pieces  by 
perfect  pieces.  Of  course  rectifying  inspection  shows  the  same  problems  as  a  complete  inspection,  i.e.,  there 
is  no  guarantee  that  all  the  defective  pieces,  whether  from  the  sample  or,  in  case  of  rejection  of  this  sample, 
from  the  whole  lot,  will  be  effectively  detected  and  replaced. 

Therefore,  it  is  said  that  rectifying  inspection  tends  to  improve  lot  quality,  although  it  is  not  guaranteed  that 
at  the  end  of  the  rectifying  process  the  lot  will  have  a  0%  rate  of  defective  pieces.  This  happens  because  of 
both  considering  the  situation  in  which  the  samples  were  accepted  (in  this  case  the  rest  of  the  lot  has  not 
been  analyzed),  and  observing  the  natural  practical  difficulty  to  detect  all  of  the  defective  pieces  of  the  lot 
(in  those  cases  of  rejection  of  the  original  sample). 


FLOW  CONTROL  SYSTEM  2 

It  consists  of  an  Expert  System  based  on  rules,  having  47  rules  and  22  qualifiers.  The  characteristics  of  the 
system  are  the  same  as  those  of  system  1 .  Thus,  for  instance,  the  system  can  list  all  the  qualifiers  as  well  as 
the  rules  where  they  are  being  used.  It  can  also  show  all  the  rules  in  which  the  choices  were  used.  In  this 
case,  the  choices  appear  in  all  the  rules  used  for  making  the  decision.  There  are  two  options  for  decisions 
here:  Inspection  for  Acceptance  or  Rectifying  Inspection.  Here  too  the  adequacy  of  the  option  chosen  is 
made  evident  when  values  close  to  10  are  given  to  it;  its  inadequacy  is  characterized  by  values  close  to  0. 
As  an  example  of  a  rule  we  have:  IF  there  are  perfect  pieces  in  stock  and  at  low  cost,  THEN  Inspection  for 
acceptance  -  Probability:  2/10;  Rectifying  Inspection  -  Probability:  7/10  (Rule  25).  As  an  example  of  a 
qualifier  we  have:  The  inspection  is  carried  out  in  terms  of  (1)  raw  material  from  various  suppliers  and 
easily  available;  (2)  raw  material  from  various  suppliers  and  of  difficult  availability;  (3)  raw  material  from 
exclusive  suppliers  (Qualifier  28).  Like  the  previous  system,  most  of  the  rules  have  bibliographical 
references,  providing  them  with  a  conceptual  background.  Some  rules  also  have  explanatory  notes  as  to 
their  formulation  or  concepts  therein  included. 

The  system  is  made  up  of  4  basic  areas  involving  analyses  related  to  the  nature  of  the  inspection,  of  the 
process  and  of  the  lots,  and  it  also  takes  into  account  the  suppliers  and  raw  materials.  In  broad  terms,  each 
area  involves  the  following  aspects,  amongst  others:  (1)  As  to  the  nature  of  the  inspection:  role  played  by 
the  inspection  in  the  quality  of  the  process;  actions  resulting  from  the  inspection;  general  objectives  and 
emphasis  given  by  the  inspection;  scope  of  the  inspection  in  relation  the  productive  process;  areas  of  action 


169 


of  the  inspection;  (2)  as  to  the  nature  of  the  process:  evaluation  of  the  supplier^  average  quality  level; 
general  characteristics  of  production  planning  and  control;  stocking  structure;  (3)  as  to  the  nature  of  the 
lots:  relation  between  lots  and  samples;  use  of  lots  of  pieces  after  the  quality  evaluation  decision;  (4)  as  to 
suppliers  and  raw  materials:  relationship  with  suppliers  in  terms  of  quality  control  of  the  lots  purchased  and 
raw  material  reposition  levels. 


PRACTICAL  APPLICATION 

The  Expert  Systems  were  deployed  in  four  wall  tile  factories  in  24  material  reception  situations.  Altogether, 
56  decisions  had  to  be  made.  A  group  of  ‘ experts  ’  monitored  the  decisions  made  by  the  system  (these 
experts  were  in  fact  5  students  from  of  the  graduation  programs  participating  in  a  technical  training  at  the 
company  and  7  employees  of  the  company  working  as  supervisors  and  involved  with  raw  material 
reception).  They  considered  53  decisions  to  be  correct,  2  mistaken  decisions  were  make  in  system  1,  which 
did  not  come  to  be  problematic,  since  the  system  requested  an  inspection  and  this  decision  could  be  left  out 
-  costs  were  raised,  however  without  problems  to  the  use  of  raw  material  in  the  factory.  There  was  still  1 
mistaken  decision  in  system  2  where,  according  to  the  cost  of  the  stock  of  perfect  pieces,  inspection  for 
acceptance  would  have  been  more  appropriate. 


CONCLUSION 

In  view  of  the  results  obtained  with  the  experimental  implementation  of  the  systems,  they  were  considered 
to  be  adequate  to  the  various  cases  studied.  It  is  important  to  remark  that  in  some  situations  the  systems 
were  tested  at  specific  moments  where,  due  to  the  characteristics  considered,  a  certain  result  from  the 
processing  was  expected.  The  decision  previously  determined  ,was  the  same  made  by  the  system  in  all  the 
cases  studied  (there  were  14  cases  within  this  context  and  14  correct  decisions). 

According  to  the  results  of  the  application  of  the  Expert  Systems,  a  list  could  be  drawn  of  a  series  of  actions 
complementary  to  the  inspection,  which  make  possible  to  optimize  the  operation  of  the  quality  evaluation 
system  as  a  whole.  Such  operations  are  determined  by  results  reached  in  the  two  systems  described  above. 

Evaluation  of  the  application  of  the  system  can  still  be  carried  out  by  monitoring  the  results  of  their 
successive  applications.  Experimental  implementations  show  that  their  decisions  alter  according  to  well- 
defined  factors.  The  change  in  the  results  is,  thus,  always  due  to  such  alterations,  which  can  and  must  be 
monitored,  because  they  indicate  situations  requiring  control  —  almost  always  preventive. 

Tests  carried  out  on  the  operation  of  the  system  showed  that  their  sensitivity  is  high  and  that  their  results 
can  be  altered  with  slight  changes  in  the  decisions  made  as  an  answer  of  the  qualifiers.  This  approach, 
however,  could  be  impaired,  should  the  whole  system  be  reprocessed,  which  would  result  in  efficiency  loss. 
But  it  does  not  happen,  since  in  the  operation  of  the  system  there  are  devices  which  allow  to  alter  only  some 
qualifiers.  The  systems  stores  the  previous  result  as  well  as  all  the  decisions  made.  The  system  shows  the 
new  results,  making  possible  to  compare  both  sets  of  solutions.  From  the  evaluation  of  the  answers  offered 
in  the  twofold  processing  and  the  comparative  analysis  of  the  results  come  the  information  to  determine  the 
actions  to  be  taken.  Additionally,  by  using  the  WHY’  device  the  direction  being  given  to  the  analysis  by  the 
system  can  be  observed  and  so  can  the  elements  being  stressed. 

Shortly,  the  testing  of  the  effect  that  changes  on  the  entry  data  have  on  the  results  is  thus  developed: 
selections  in  some  qualifiers  are  altered  and  the  others  remain  unchanged.  Next,  data  are  processed 
according  to  this  new  situation  and  the  effect  of  the  changes  is  observed  in  the  final  result.  The  value  of  the 
previous  decisions  is  kept  in  a  record  for  comparison  purposes  with  the  new  values.  This  procedure  has  yet 
an  extra  advantage:  decisions  taken  in  the  qualifiers  can  be  created  and  analyzed  based  on  the  results 
obtained.  We  can,  thus,  determine  the  relevance  (relative  weight)  of  a  decision  in  obtaining  a  result  and  give 
it  more  or  less  emphasis,  according  to  its  influence  on  the  processing  of  the  system. 


170 


Here  is  a  simple  example.  In  the  operation  of  System  2,  inspection  for  acceptance  is  indicated  for  lots  of  a 
certain  supplier.  In  this  case,  having  reposition  stocks  is  unfeasible  and,  by  applying  rule  two  of  the  system, 
building  such  stocks  is  deemed  costly.  This  factor,  together  with  others,  determines  inspection  for 
acceptance.  A  change  of  supplier  was,  however,  processed  and  now  the  pieces  are  bought  from  a  place 
much  close  to  the  company.  In  accordance  with  a  legal  contract,  the  new  supplier  keeps  available  stocks  of 
reposition  pieces.  So  the  defective  pieces  are  quickly  replaced.  Additionally,  it  is  worth  mentioning  that, 
based  on  this,  it  is  possible  to  equip  control  procedures  with  means  to  change  the  final  quality  of  the  lots 
inspected.  Thus,  rules  1,  3  and  12  have  their  options  altered,  i.e.,  new  decisions  are  made  for  the 
corresponding  qualifiers.  The  system  then  starts  to  recommend  rectifying  inspection,  altering  its  original 
proposal.  It  becomes  clear  that  the  new  situation  offers  undeniable  advantages  over  the  previous  one; 
nevertheless,  its  use  was  convenient  only  because  some  working  conditions  of  the  productive  process 
varied. 

The  monitoring  of  these  alterations  by  the  Expert  System  allowed  the  change  of  lot  evaluation  procedures 
with  evident  benefits  for  the  company.  It  also  highlighted  the  benefits  of  the  application  of  intelligent 
processes  of  decision  for  the  control  of  the  raw  material  flow  of  the  company. 


REFERENCES 

1.  Aft,  L.  S.  1996.  Industrial  Quality  Control.  Reading,  Addison-Wesley. 

2.  Cullen  J.  &  Hollingum,  J.  1987.  Implementing  Total  Quality.  IFS  Publications.  NY,  Springer-Verlag. 

3.  Tenner,  A.  R.  &  DeToro,  I.  J.  1992.  Total  Quality  Management.  Reading,  Mass.  Addison-  Wesley. 

4.  Dagli,  C.  H.  1990.  Expert  Systems  for  Selecting  Quality  Control  Charts.  USF  Report.  P.  325-343. 

5.  Dagli,  C.  H.  &  Stacey,  R.  1988.  A  Prototype  Expert  System  for  Selecting  Control  Charts.  Int.  Journal  of 
Production  Research.  26(2),  987-996. 


171 


Fuzzy  Systems  I 


172 


173 


Industrial  Applications  of  Fuzzy  System  Modeling 

I.B.  Turksen 

Information  Intelligent  Systems, 

Department  of  Mechanical  and  Industrial  Engineering, 
University  of  Toronto,  Toronto,  Ontario,  M5S  3G8  Canada 
Email:  turksen@mie.utoronto.ca 


ABSTRACT 

Aggregate  industrial  system  behaviour  models  can  be  built  with  fuzzy  data  mining  provided  historical 
system  behaviour  data  is  available  from  system  databases.  Given  the  input-output  data  vectors,  a  unified 
system  modeling  approach  can  be  used  to  extract  "hidden  rules"  of  system  behaviour  using  fuzzy 
technology.  In  particular,  fuzzy  cluster  analysis  could  be  used  with  unsupervised  learning  to  extract  fuzzy 
set  membership  function  and  the  fuzzy  rule  structures.  A  parametric  reasoning  method  combined  with 
supervised  learning  with  minimum  error  criteria  could  determine  combination  operators.  This  eliminates 
the  arbitrary  choice  of  t-norms  and  t-conorms  that  are  required  in  the  execution  of  approximate  reasoning 
algorithms.  Examples  include  continuous  caster  scheduling  in  steel  making  with  criteria  of  minimum 
tardiness  and  minimum  mixed  grade  steel  production.  As  well  as,  this  methodology  could  be  applied  to 
pharmacological  analysis  of  experimental  data  for  alcohol  dependence,  lithium  retention  and  sertraline 
influence  on  alprazolam,  etc. 


INTRODUCTION 

The  motivation  of  this  paper  centers  on  the  need  to  provide  a  management  team  of  experts  with  an 
aggregate  system  modeling,  analysis  diagnosis  prediction,  and  decision  support  and  hence  an  overall  view 
of  the  operating  conditions  for  an  industrial  system  and  its  behaviour  patterns  under  the  current  structure  of 
operating  procedures.  The  hypothesis  is  that  generally  the  actual  structure  of  the  operating  rules  are  hidden 
due  to  complex  interactions  of  a  large  number  of  variables  and  various  procedures  implemented  by 
managers  that  affect  the  performance  measure(s). 

Furthermore,  it  is  hypothesized  that  such  interactions  are  not  well  describable  in  a  crisp  way  and  that  such 
interactions  generally  demonstrate  highly  non-linear  and  fuzzy  patterns  that  can  be  elicited  via  fuzzy 
system  structure  identification.  It  should  be  noted  that  in  complex  systems  there  are  at  least  two  categories 
of  variables:  (1)  those  that  are  objectively  measurable  and  have  a  direct  impact  on  the  operations,  for 
example,  once  a  schedule  is  fixed  in  a  steel  plant,  width  of  continuous  casting  strand  must  be  adjusted  to 
various  customer  order  specifications,  etc.,  and  (2)  those  that  are  subjectively  measurable  and  have  an 
indirect  but  definite  influence  on  the  system  activities  such  as  the  sequence  of  operations,  for  example,  the 
priority  of  the  customer  orders,  etc.  Hence,  the  interactions  of  these  two  categories  of  variables  together 
with  the  procedures  with  which  they  are  implemented  by  the  managers  are  rather  complex  and  appear  to  be 
highly  non-linear. 

Therefore,  for  the  analysis  of  systems,  our  goal  is  to  propose  a  fuzzy  system  modeling  approach  that  would 
identify  the  structure  of  hidden  rules  which  would  select  important  variables  and  their  fuzzy  patterns  as 
well  as  their  connectives  that  affect  system  performance  measure(s)  via  supervised  and  unsupervised 
learning  techniques  where  appropriate.  Such  system  models  would  serve  two  purposes:  (1)  an  identification 
and  an  assessment  of  critical  bottle-neck  variables  that  may  be  subject  to  change  and  modification  to 
improve  the  efficiency  and  effectiveness  of  a  given  capacity  of  productive  activity,  and  (2)  prediction(s)  of 
a  performance  measure  indicator(s)  for  given  customer  orders  under  the  current  operating  rules  of 
scheduling  procedures.,  and  (3)  managerial  short  and  long  term  policy  analysis  and  potential  re-engineering 
suggestions  to  improve  system  performance  measures.  Naturally  one  would  ask  whether  such  system 
models  might  not  be  developed  by  classical  statistical  partitioning  methods  such  as  principle  component 
analysis,  multi-variate  regression  analysis,  etc.  Clearly,  the  advantage  of  fuzzy  system  modeling  to  be 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


174 


proposed  here  are  that:  (1)  fuzzy  knowledge  representation  techniques  allow  the  extraction  of  highly 
complex  interactions  of  input  variables  and  their  effect  on  the  output  variables.  That  is,  fuzzy  information 
granules  and  their  fuzzy  membership  functions  provide  us  with  many-to-many  correspondence  with 
varying  gradations  which  can  represent  complex  fuzzy  patterns  of  interaction  amongst  input  variables  and 
their  effects  on  the  output  variables;  and  (2)  fuzzy  inference  techniques  allow  reasoning  with  fuzzy 
knowledge  representation  for  crisp  or  fuzzy  input  patterns  via  forms  of  generalized  reasoning  methods  such 
as  generalized  modus  ponens,  etc.,  and  hence  an  assessment  and  prediction  or  forecasting  of  system 
performance  measures  for  new  customer  orders. 

On  the  other  hand,  it  should  be  noted  that  there  are  certain  disadvantages  to,  for  example,  multi-variate 
regression  models  which  require  certain  assumptions,  such  as:  (1)  linearity  and  additivity,  (2)  model 
structure,  i.e.,  the  order  of  the  polynomial,  (3)  normality  and  same  variance,  etc.  Fuzzy  system  models  do 
not  require  such  assumptions  to  be  made  at  all. 

That  is,  fuzzy  system  models  would  depict  and  demonstrate  non-linear  and  complex  interactions  in  a 
flexible  and  robust  manner  amongst  system  variables  and  represent  their  trade  offs  more  clearly  as  well  as 
forecast  performance  measure  indicators  under  current  operating  conditions. 

In  Section  2,  the  theoretical  and  technical  aspects  of  the  structure  of  fuzzy  rule  base  is  outlined  briefly.  In 
Section  3,  we  present  the  aggregate  scheduling  analysis  in  continuous  casting,  with  its  rules,  and  fuzzy 
patterns  of  variables  in  the  rules  in  order  to  highlight  the  analyses  of  complex  interactions  of  variables  that 
affect  the  performance  measures,  of  a  caster  scheduling  such  as  ( 1 )  customer  order's  tardiness  (2)  mixed 
zone  tonnage  productions  which  is  complementary  to  the  tardiness  measure  and  (3)  a  higher  order 
performance  measure  that  attempts  to  balance  the  interaction  of  tardiness  and  mixed  zone. 

Conclusions  are  presented  in  Section  4. 


FUZZY  SYSTEM  MODELING  PARADIGM 

The  proposed  fuzzy  system  modeling  approach  has  two  main  modules:  (1)  knowledge  representation  and 
(2)  approximate  reasoning.  The  knowledge  representation  module  is  based  on  a  modified  and  improved 
fuzzy  c-means  algorithm  which  is  an  extension  of  classical  FCM  algorithm  in  several  respects.  The 
knowledge  representation  is  developed  with  an  unsupervised  learning  with  respect  to  a  given  input-  output 
data  set  and  three  parameters  of  fuzzy  clustering.  The  approximate  reasoning  module  contains  four 
reasoning  parameters  that  are  subject  to  supervised  learning  for  a  given  input-output  data  set  and  error 
minimization  criteria. 

Knowledge  representation  is  formulated  with  a  structure  identification  technique  that  first  elicits  output 
patterns  in  terms  of  fuzzy  clusters,  then  relates  them  to  the  input  variables'  fuzzy  clusters  and  hence  leads  to 
the  formation  of  fuzzy  rules  in  IF...THEN  patterns  [13-18].  Fuzzy  clustering  algorithm  has  three 
parameters:  (1)  Order  or  level  of  fuzziness,  m0[l,4]  that  identifies  the  optimal  level  of  fuzziness  between 
the  crisp  and  infinitely  fuzzy  representation;  (2)  the  number  of  fuzzy  clusters,  c  0  [2,  ...,  20];  and  (3)  the 
location  of  fuzzy  cluster  centres.  As  a  result,  the  structure  identification  consists  of  the  rule  generation 
activity  which  has  three  sub-modules:  (1)  output  and  input  clustering,  (2)  input  and  output  membership 
function  assignments,  and  (3)  input  selection. 

The  approximate  reasoning  module  has  four  parameters  (1)  two  parameters  p  and  q  are  associated  with  (a) 
the  t-norm  for  the  AND  connective  of  the  antecedents,  i.e.,  input  variable  clusters,  and  (b)  the  t-norm  or  t- 
conorm  of  the  IMPLICATION  connective,  depending  on  the  type  of  implication  that  is  to  be  chosen,  that 
dictates  the  impact  of  inputs  on  the  output  variable,  i.e.,  the  consequent,  respectively;  and  (2)  two  other 
parameters  (3  and  a  are  associated  with:  (a)  the  combination  of  alternative  reasoning  methods,  and  (b)  the 
power  of  the  generalized  defuzzification  method,  respectively.  These  four  parameters  are  identified  with  a 
supervised  learning  algorithm  based  on  a  training  data  set  in  order  to  minimize  the  error  of  the  model  with 
respect  to  either  the  actual  system  output  measure  or  a  desired  system  output  measure.  These  four 
parameters  will  be  discussed  further  in  detail  in  the  subsection  below.  The  technology  is  based  on  our 
theoretical  studies  [13-18], 


175 


Therefore,  it  should  be  noted  that  fuzzy  system  modeling  depends  on  (1)  unsupervised  learning  with  an 
input-output  data  set  without  requiring  the  acquisition  of  human  expert  knowledge  and  (2)  supervised 
learning  of  the  AND,  conjunction,  and  IMPLICATION,  IF  ..  THEN,  operators  with  respect  to  an  error 
minimization  of  a  performance  measure  indicator  without  any  requirement  of  the  selection  of  these 
operators  a  priori. 

In  summary,  training  data  determines:  (1)  the  structure  of  the  system  knowledge,  via  the  extraction  of 
hidden  patterns  with  unsupervised  learning,  and  (2)  the  interaction  of  these  hidden  patterns  by  specifying 
the  parameters  of  connectives  and  weighting  of  alternate  reasoning  methods  and  alternate  defuzzification 
techniques  with  supervised  learning. 


AGGREGATE  SYSTEM  ANALYSIS 

In  this  section,  we  present,  the  key  issues  of  aggregate  system  analysis  of  a  continuous  caster  scheduling 
system  in  a  conceptual  framework  based  on  the  technical  foundation  of  fuzzy  system  modeling  paradigm 
outlined  in  Section  2  above. 

In  this  framework,  we  deal  with,  very  briefly;  (i)  representation  of  the  system,  (ii)  analysis  and  diagnosis  of 
it,  and  (iii)  prediction  of  performance. 

The  representation  of  system  is  made  with  structure  identification  techniques  which  include  (i)  fuzzy 
cluster  analyses  of  the  output,  (ii)  projection  of  the  output  clusters  into  input  space,  (iii)  identification  of 
input  clusters,  (iv)  formation  of  input  and  output  membership  functions,  (v)  selection  of  significant  input 
variables  and  their  clusters,  and  finally  (vi)  the  determination  of  IF...  THEN  fuzzy  rules  that  expose  the 
hidden  behaviour  of  the  system. 

Essential  to  this  is  the  identification  of  fuzzy  information  granules.  As  depicted  in  Figure  1,  given  a  scatter 
diagram,  we  have  the  option  of  determining  fuzzy  information  granules  with  fuzzy  clustering  techniques  or 
fitting  a  non-linear  curve  with  multi-variate  regression  techniques.  We  choose  to  represent  our  knowledge 
about  a  scatter  diagram  with  fuzzy  information  granules  simply  because  as  pointed  out  earlier,  we  do  not 
have  to  make  assumptions  of  linearity,  additivity,  normality,  same  variance,  etc. 

In  this  manner,  we  identify  fuzzy  information  granules  of  each  variable  that  affect  the  performance  measure 
in  combination  with  all  input  variables  and  their  information  granules. 

As  a  result,  four  major  hidden  rules  were  discovered  for  the  tardiness  measure.  For  example,  one  of  the 
rules  shows  how  low  tardiness  measure  may  be  obtained  when  the  indicated  information  granules  of  the 
input  variables  interact  in  a  non-linear  manner.  The  analysis  of  this  rule,  for  example,  indicates  that  the 
tardiness  of  customer  order  delivery  due  dates  will  be  low  when  low  to  moderate  priority  and  very  low  to 
moderate  grade  transition  penalty  and  somewhat  high  customer  order  widths  and  somewhat  high 
minimum  slab  widths  and  somewhat  average  maximum  slab  widths  and  more  or  less  average  qualities 
and  very  low  and  low  customer  order  widths  and  very  low  and  low  number  of  slabs  that  are  required  per 
order  and  average  to  large  slab  weights  interact  together. 

Further  analysis  show  that  the  critical  variables  in  the  investigation  of  tardiness  measure.are: 

(i)  maximum  width  of  slabs 

(ii)  quality  of  products 

(iii)  customer  order  weights 

(iv)  weights  of  the  slabs 

In  a  similar  manner,  five  major  rules  were  discovered  for  the  mixed  zone  tonnage  production.  In  the  mixed 
zone  tonnage  analysis,  the  critical  variables  are: 

(1)  phosphors,  (2)  sulfur,  (3)  aluminum,  (4)  nitrogen,  (5)  tundish  start  weight,  (6)  tundish  end  weight, 
and  (7)  average  casting  speed. 


176 


Finally  a  combined  analysis  of  tardiness  and  mixed  zone  measures  produced  eight  hidden  rules.  The 
investigation  of  the  rules  show  that  low  cost  of  balancing  tardiness  versus  mixed  zone  production  could  be 
achieved  when  very  low  to  somewhat  low  grade  transition  penalties  are  combined  with  average  to 
somewhat  large  maximum  slab  widths  and  high  quality  and  low  customer  order  weights  and  low  number 
of  slabs  and  somewhat  low  carbon  and  very  low  to  low  Mn  and  very  low  P  and  somewhat  low  S  and  low 
to  high  Si  and  low  Ti  and  average  tundish  start  weights  and  average  of  average  casting  speeds. 


Precise  vs  Granular  Relations 


Information  Granules  of  Fuzzy  Precise  Functional  Graph 
Figure  1.  Formation  of  Fuzzy  Information  Granules,  vs.,  multi-variate  regression-curve  fitting 

COMPARISON  WITH  REGRESSION  MODELS 

In  order  to  compare  fuzzy  system  model,  we  modeled  the  same  data  sets  with  multi-variate  regression 
techniques  It  is  to  be  observed  that  in  the  tardiness  regression  model  customer  order  widths  and  minimum 
slab  widths  are  excluded  from  the  model.  In  the  mixed  zone  regression  model  no  variable  is  considered  to 
be  significant.  Further  analysis  of  the  results  reveal  that  the  standard  deviation  of  residuals  (errors)  of 
multiple  regressions  is  14.69  that  the  fuzzy  model  is  3.9  in  the  tardiness  model  analyses.  In  the  mixed  zone 
tonnage  analyses,  while  the  standard  deviation  of  residuals  (errors)  of  multiple  regression  is  14.86  that  of 
the  fuzzy  model  is  4.33. 

On  the  basis  of  these  results,  it  is  concluded  that  fuzzy  system  modeling  gives  low  prediction  errors,  hence 
fuzzy  system  models  are  better  representation  of  hidden  system  behaviour. 

REFERENCES 

1.  Bezdek,  J.C.,  Windham,  M.P.,  Ehrlich,  R.,  1980.  Statistical  parameters  of  cluster  validity  functional. 
International  Journal  of  Computer  Information  Science,  9(4):  324-336. 

2.  Bezdek,  J.C.,  1981.  Pattern  recognition  with  fuzzy  objective  function  algorithms.  Plenum  Press,  NY. 

3.  Duda,  R.O.,  Hart,  P.E.,  1973.  Pattern  classification  and  scene  analysis.  John  Wiley  and  Sons,  Inc. 

4.  Emami,  M.R.,  Turksen,  I.B.,  Goldenberg,  A.A.,  (1996)  An  improved  fuzzy  modeling  algorithm,  Pt.  I: 
inference  mechanisms.  Proc.  NAFIPS'96,  Berkeley,  CA:  289-293. 

5.  Emami,  M.R.,  Turksen,  I.B.,  Goldenberg,  A.A.,  1996.  An  improved  fuzzy  modeling  algorithm,  part 
II:  system  identification.  Proceedings  of  1996  Biennial  Conference  of  the  North  American  Fuzzy 
Information  Processing  Society-NAFIPS:  294-298. 

6.  Fukuyama,  Y.,  Sugeno,  ML,  1989.  A  new  method  of  choosing  the  number  of  clusters  for  fuzzy 
c-means  method.  Proc.  5th  Fuzzy  Systems  Symposium  (in  Japanese)  247-250. 

7.  Kandel,  A.,  1982.  Fuzzy  techniques  in  pattern  recognition.  Wiley,  New  York. 

8.  Kaufman,  L.,  Rousseeuw,  P.J.,  (1990)  Finding  groups  in  data.  Wiley,  New  York. 

9.  Keller,  J.M.,  Gray,  M.R.,  Givens,  J.A.,  1985.  A  fuzzy  k-nearest  algorithm.  IEEE  Trans.  Systems, 

Man.  and  Cybernetics  SMC-15(4):  580-585. 

10.  Kosko,  B.,  1997.  Fuzzy  engineering.  Prentice  Hall,  New  Jersey,  U.S.A. 

11.  Nakanishi,  H., Turksen,  I.B.,  Sugeno,  M.,  1993.  A  review  and  comparison  of  six  reasoning  methods, 
fuzzy  sets  and  systems  57:  257-295. 


177 


12.  Slany,  W.,  1996.  Scheduling  as  a  fuzzy  multiple  criteria  optimization  problem.  Fuzzy  Sets  and 
Systems,  78:  197-222. 

13.  Sugeno,  M,  Yasukawa,  T.,  1993.  A  fuzzy-logic  based  approach  to  qualitative  modeling.  IEEE  Trans. 
Fuzzy  Systems  1(1):  7-31. 

14.  Turksen,  I.B.,  1986.  Interval-valued  fuzzy  sets  based  on  normal  forms,  Fuzzy  Sets  and  Systems, 
20(2):  191-210. 

15.  Turksen,  I.B.,  1995.  Fuzzy  normal  forms,  fuzzy  sets  and  systems  69:  319-346. 

16.  Turksen,  I.B.,  1995.  Type  1  and  interval-valued  Type  II  fuzzy  sets  and  logics,  in:  P.P.  Wang, 
Advances  in  Fuzzy  Theory  and  Technology,  Bookright  Press,  Raleigh,  NC  3:31-82. 

17.  Turksen,  I.B.,  1996.  Fuzzy  truth  tables  and  normal  forms.  Proceedings  of  BOFL'96,  December  15-18, 
1996,  TIT,  Nagatsuta,  Yokohama,  Japan:  7-12. 

18.  Turksen,  I.B.,  1996.  Type  I  and  Type  II  fuzzy  system  models.  Special  Issue,  FSS. 

19.  Ward,  J.H.,  1963.  Hierarchical  grouping  to  optimize  an  objective  function.  J.  Amer.  Stat.  Assoc.  58: 
236-244. 

20.  Yager,  R.R.,  Filev,  D.P.,  1994.  Essentials  of  fuzzy  modeling  and  control.  John  Wiley  and  Sons. 


178 


179 


From  Intelligent  Models  To  Smart  Ones 


Heikki  Hyotyniemi 


Helsinki  University  of  Technology 
Control  Engineering  Laboratory 
P.O.  Box  5400,  FIN-02015  HUT,  Finland 


ABSTRACT 

The  "intelligent"  modelling  methods  may  perhaps  not  be  the  ultimate  solution.  This  paper  discusses  the  new 
questions  that  have  emerged  after  the  introduction  of  the  modem  soft  computing  methods,  and  proposes  a 
framework  for  attacking  the  new  problems. 

INTRODUCTION 

The  field  of  “computational  intelligence”  can  today  be  seen  in  a  perspective.  It  seems  that  the  new 
modelling  methods,  neural  networks  or  fuzzy  systems,  are  sometimes  applied  without  even  thinking  of  the 
other,  more  traditional  alternatives.  The  tool  has  become  more  important  than  the  application  itself. 
However,  in  modelling,  the  primary  task  is  to  capture  the  properties  of  the  real-life  process  in  the  best 
possible  way;  the  tool  for  accomplishing  this  should  be  selected  accordingly. 

The  intelligent”  techniques  promise  to  make  the  modelling  task  transparent  to  the  user,  and  that  is  why 
they  have  been  eagerly  waited  for  by  practitioners  tackling  with  complex  processes.  When  applying  the  new 
methodologies,  the  normal  procedure  that  is  followed  is  to  take  one  of  the  general-purpose  soft-computing 
algorithms  and  apply  it  as  any  of  the  other  bff-the-shelf  tools.  Multi-layered  feedforward  perceptron 
networks,  for  example,  have  been  shown  to  be  capable  of  representing  any  smooth  function  -  assumedly 
they  can  be  used  also  as  a  model  for  the  process  at  hand.  However,  from  the  point  of  view  of  process 
modelling,  there  are  two  main  problems  with  the  concurrent  soft-computing  methodologies: 

•  First,  they  seem  to  be  too  powerful;  to  constrain  the  number  of  alternative  behavioural  patterns  in  the 
network  the  number  of  training  data  samples  that  is  needed  grows  exponentially  as  a  function  of  the  free 
parameters  (the  models  do  not  ‘bcale  up”;  whereas  modelling  in  a  toy  domain  can  be  carried  out,  in  a 
more  complex  environment  problems  soon  become  overwhelming). 

•  Second,  no  structure  emerges  in  the  training  process;  the  trained  network  (or  the  conventional  compiled 
fuzzy  rule  set)  is  just  a  numerical  mapping  from  input  signals  to  output  signals. 

he  role  of  structure  is  of  utmost  importance  in  intelligent  modelling.  When  huge  industrial  processes  with 
thousands  of  simultaneous  measurement  signals  are  being  monitored,  a  tool  that  is  capable  of  extracting  the 
underlying  relationships  between  the  data  samples  may  help  the  human  expert  to  gain  some  intuition  or 
understanding  of  the  process  structure.  This  “understanding”  is  the  key  towards  truly  intelligent-looking 
modelling  results. 

In  this  paper,  the  current  modelling  paradigms  are  reviewed  from  the  above  points  of  view.  As  a  conclusion, 
a  unified  framework  is  proposed. 


ABOUT  "SMART  MODELS" 

The  role  of  a  model  is  to  capture  system  behaviour  in  a  compact  form  to  facilitate  analyses  and  applications. 
This  dual  role  is  a  delicate  matter:  constraints  come  from  two  directions,  and  compromises  are  needed  (see 
Fig.  1). 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


180 


Fig.  1.  The  role  of  the  model  as  an  "interface"  between  the  system  and  the  outside  world.  The 
model  should  match  the  properties  of  the  system,  and,  simultaneously,  it  should  match 
the  needs  of  the  humans  (or  further  applications)  that  use  the  model. 


Constraints  From  Below 

First,  study  the  boundary  conditions  that  are  determined  by  the  system  to  be  modelled.  To  look  even 
remotely  smart,  the  modelling  tool  must  support  the  structure  of  the  system.  There  is  a  problem,  though  -  to 
support  the  structure  of  a  specific  application,  the  model  should  be  tailor-made,  and  so,  the  generality  has  to 
be  sacrificed.  The  solution  to  this  problem  is  to  exclude  system  structure  in  the  kernel  of  the  model,  and 
rather,  to  design  the  model  structure  so  that  structural  properties  of  the  system  can  be  easily  implemented  in 
that  framework.  The  question  then  arises,  "what  kind  of  structural  features  are  to  be  supported?"  We  want  to 
restrict  ourselves,  to  the  absolute  minimum  complexity  to  achieve  easy  use  (trainability  and  analysability,  as 
discussed  later).  Loosely  speaking,  this  can  be  expressed  as:  the  structural  properties  to  be  supported  need  to 
be  "enumerated"  or  "parameterised". 


It  is  clear  that  as  a  starting  point,  linearity  of  the  model  structure  is  a  good  first  approximation:  Often  linear 
models  are  at  least  locally  valid  and  they  have  strong  theory. 

In  addition  to  selecting  a  linear  model  structure  as  the  general  framework,  the  domain-specific  system 
structure  must  be  captured.  To  reach  this,  families  of  functional  nonlinearities  that  are  encountered  in  a 
complex  process  must  be  detected.  There  are  also  nuances  to  be  recognised  also  in  what  is  apparent  in 
strictly  linear  systems.  It  is  assumed  that  the  basic  types  of  behavioural  patterns  that  are  typically  found  in 
complex  real  systems  are  the  following: 

•  Data  clusters  are  caused  by  the  underlying  system  alternating  between  essentially  different  internal 
structures  each  of  them  having  a  local  dependency  structure  between  the  variables.  Take  the  flotation 
process  (for  example,  see  [1])  as  an  example:  ore  coming  from  different  mines  have  varying  mineral 
contents,  and  the  process  behaviour  changes  accordingly.  Similarly,  different  control  strategies  may 
result  in  different  clusters  in  the  data.  There  is  no  real  continuum  between  the  clusters. 

•  Continuous  nonlinearities  are  common  in  practical  systems.  However,  the  nonlinearities  are  normally 
smooth,  so  they  are  locally  linearisable.  This  way  the  need  to  cover  an  infinite  number  of  non-linear 
behaviours  can  be  avoided. 

•  Independent  components  are  hidden  variables  that  define  mutually  non-orthogonal  subspace  directions 
within  linear  data.  In  flotation  data  for  example,  the  independent  subspace  directions  in  the  data  can  be 
determined  by  the  different  process  inputs:  control  actions  result  in  characteristic  responses  in  the 
process  variables.  The  different  control  signals  may  have  partially  overlapping  effects;  that  is  why  the 
subspace  axes  are  not  orthogonal. 

It  should  to  be  mentioned  that  modelling  problems  only  become  acute  in  really  large  systems.  In  toy 
domains,  all  methods  can  be  made  to  work,  however  fancy  they  are  and  no  matter  how  high  their 
computational  complexity.  In  real  environments,  the  data  are  high  dimensional.  This  high  dimensionality  is, 
of  course,  caused  by  the  large  number  of  measurements,  but  this  is  only  one  part  of  the  story.  When 
complex  phenomena  are  recorded,  the  information  is  often  delivered  in  different  formats  -  there  may  be 
image  data,  qualitative  measurements,  etc.  When  this  kind  of  multimodal  information  is  processed,  it  first 
must  be  coded  as  real  numbers  so  the  problem  of  data  heterogeneity  is  converted  to  one  of  high 
dimensionality.  The  modelling  tool  must  be  insensitive  to  the  large  quantity  of  data  and  the  high  amount  of 
poorly-preprocessed  data  also  means  that  some  are  badly  conditioned  -  there  may  exist  collinearity  among 
the  measurements,  and  irrelevant  data  may  exist. 

It  turns  out  that  the  above  assumptions  about  data  ontology  offer  powerful  conceptual  tools  to  mastering 
complex  systems. 


181 


Constraints  From  Above 

An  equally  important  aspect  about  the  value  of  a  model  is  how  it  looks  from  the  outside.  There  are  also 
various  aspects  to  take  care  of-  for  example: 

•  Model  applicability  is  important  if  something  such  as  prediction  or  control  is  to  be  implemented.  The 
model  structure  should  be  flexible;  from  a  control  engineering  point  of  view,  for  example.  A  very  basic 
need  is  that  input-output  relationships  can  be  represented. 

•  Efficiency  in  terms  of  training  time  needed  to  fix  the  model  parameters  is  an  essential  practical  factor. 
Another  practical  value  of  the  model  is  its  robustness  or  consistency,  i.e.,  how  reliable  is  the  training 
method;  will  the  results  be  the  same  if  the  training  process  is  repeated? 

•  Comprehensibility  reveals  how  well  the  model  and  the  training  results  can  be  interpreted  in  a  form  that  is 
understandable  to  a  human.  A nalysability  is  another  facet  of  the  same  transparency  objective:  a  good 
model  makes  it  easy  for  mathematical  tools  to  operate  on  the  data  structures. 

Whereas  most  of  the  above  criteria  are  strictly  technical,  the  comprehensibility  objective  is  crucial  from  the 
point  of  view  of  artificial  intelligence  (AI).  To  be  called  truly  intelligent,  the  created  model  must  offer  some 
added  value  such  as  new  intuition  to  the  user.  Automatically-created  constructs  should  offer  a  bridge  from 
the  numeric  level  to  the  conceptual  level.  This  may  sound  unrealistic  -  but  all  "intelligent"  methods  that  do 
not  address  this  question,  are  deficient  by  definition! 

TODAYS  TOOLS 

Here,  some  of  the  prototypical  data  modelling  methods  are  reviewed  in  the  above  perspective.  None  of  them 
fulfils  all  of  the  presented  needs.  What  is  more  -  none  of  them  can  be  regarded  as  truly  "intelligent". 

Principal  component  analysis,  PCA,  (and  its  related  regression  method,  PCR)  is  based  on  multivariate  linear 
systems  theory  [2],  results  in  an  orthogonal  set  of  mathematically-motivated  hidden  variables.  On  the  other 
hand,  independent  component  analysis,  ICA,  is  a  novel  method  developed  especially  to  find  the  independent 
component  structure  [3].  The  self-organising  map,  SOM,  is  a  neuron-motivated  clustering  method  that 
makes  relationships  within  the  data  nicely  visible  [4].  The  feed-forward  perceptron  networks  [5]  and  fuzzy 
systems  [6],  denoted  here  as  NN  and  FS,  respectively,  are  perhaps  the  most  common  representatives  of 
computational  intelligence  and  they  are  also  included  in  the  comparison  below. 


Table  1.  How  the  different  methods  address  the  properties  of  the  systems  to  be  modelled. 


System  property 

PCA 

ICA 

SOM 

NN 

FS 

Local  nature  of  representations 

— 

- 

+  + 

- 

+  + 

Continuous  nonlinearities 

— 

— 

+ 

+  + 

+  + 

Independent  components 

- 

+  + 

(— ) 

(— ) 

(— ) 

High  dimensionality 

+  + 

- 

+ 

— 

— 

Table  2.  How  the  different  methods  address  the  practical  issues 


Objective 

PCA 

ICA 

SOM 

NN 

FS 

Model  applicability 

+  + 

+  + 

- 

+ 

+ 

Efficiency,  trainability 

+  + 

+ 

- 

- 

(-) 

Robustness,  consistency 

+  + 

+ 

- 

- 

(-) 

Comprehensibility  and  analysability 

+  + 

+  + 

+  + 

— 

(+) 

182 


Tables  1  and  2  summarise  the  pros  and  cons  of  the  different  modelling  approaches.  In  many  cases  the 
comparisons  are  difficult  because  the  methods  are  tailored  for  different  kinds  of  applications.  However,  they 
are  sometimes  advertised  as  a  panacea,  so  perhaps  some  criticism  is  justified,  after  all.  The  parentheses  in 
the  tables  mean  that  the  properties  cannot  be  evaluated:  for  example,  fuzzy  rule  bases  are  normally 
explicitly  determined,  so  that  the  criteria  concerning  automatic  generation  of  the  models  cannot  be  applied. 
Only  methods  that  are  explicitly  based  on  linear  theory  can  be  used  for  modelling  independent  components 
in  the  data.  One  '+'  or  sign  instead  of  two  is  used  if  only  a  partial  solution  is  offered  by  the  method;  for 
example,  SOM  is  especially  tailored  for  visualising  high-dimensional  data,  but  it  is  necessary  that  there  be 
only  a  few  degrees  of  freedom,  the  data  being  distributed  on  a  2-dimensional  manifold,  etc. 


SPARSITY  -  THE  KEY? 

The  question  that  arises  is  how  the  objectives  can  be  combined  in  the  same  framework.  It  turns  out  that  a 
technique  called  sparse  coding  offers  a  very  potential  alternative.  Sparsity  means  that  the  model  consists  of 
a  set  of  simple  constituents;  at  any  time  only  a  subset  of  all  available  constructs  is  utilised.  In  the  sparse 
coding  framework,  it  is  not  the  total  number  of  memory  elements  that  are  to  be  minimised;  it  is  the  number 
of  simultaneously  active  elements  that  are  minimised  [7]. 

Assume  that  the  underlying  substructures  are  defined  as  vectors  spanning/ea  dimensions  in  the  data 
space,  so  the  model  becomes  piecewise  linear,  the  observation  data  vector/ being  presented  as  a  weighted 
sum  of  the  features  <p, : 

/  =  1- 
;=1 

To  better  match  the  data  modelling  needs,  the  purely  mathematical  sparse  coding  formulation  can  be 
slightly  refined.  Separate  one  of  the  selected  features,  so  that  it  stands  for  the  operating  point,  and  let  the 
other  features  represent  "fine-tuning"  around  this  cluster  centre  vector.  Assume  there  exists  alternative 
operating  points,  and  out  of  the  n2  additional  features  only  N  are  utilised: 

f  ~(l))'£[l,ni]  '^/efl.wi]  2. 

N 


Examples 

To  gain  some  intuition,  the  different  system  properties  are  implemented  in  the  sparse  coding  framework.  In 
Figs.  2,  3,  and  4,  examples  of  data  clusters,  nonlinearities,  and  independent  components  are  shown.  It  needs 
to  be  recognised  that  the  low-dimensional  projections  of  the  assumedly  very  high-dimensional  data  give  a 
trivialised  view  of  the  problem  complexity. 


Figure  2.  Clustered  data;  sparse  model  with  nx  =  3,  «2  =  1 ,  and  N  =  1 . 


Let  us  study  Fig.  2  a  bit  closer.  If  the  clustered  data  are  studied  as  a  whole,  no  PCA-like  data  compression 
scheme  can  be  applied:  the  data  are  genuinely  3 -dimensional,  the  data  covariance  matrix  having  three 
approximately  equal  eigenvalues.  The  sparse  model  now  contains  four  constructs  (n  =  nt  +  n2)  -  more  than 


183 


the  dimensions  of  the  data!  Seen  from  a  traditional  modelling  point  of  view,  this  "inflation"  of  the 
representation  is  intolerable.  All  individual  data  points  can,  however,  be  represented  rather  accurately  using 
only  two  constructs:  the  appropriate  cluster  centre  vector  and  the  single  fine-tuning  parameter. 


Figure  3.  Nonlinear  behaviour;  sparse  model  with  nx  =  4,  n2  =  2,  and  N  =  1 


Figure  4.  Independent  components;  sparse  model  with  =  1 ,  n2  =  2,  and  N  =  1 

It  can  be  said  that  the  sparsity  results  in  an  emergent  structure.  The  constructs  that  facilitate  compact 
representations  tell  something  fundamental  about  the  underlying  system.  It  can  even  be  claimed  that  one 
step  towards  the  truly  intelligent  model  has  been  taken:  the  sparse  representation  of  a  system  is  "symbolic 
but  not  labelled"  -  the  interpretation  of  the  extracted  dependency  structures  must  be  carried  out  separately 
by  the  human  domain  area  expert. 


DISCUSSION 

Even  though  the  sparse  model  is  "almost"  linear,  it  incorporates  all  of  the  assumed  nonlinear  phenomena; 
the  operation  of  the  model  is  linear  only  after  the  appropriate  constructs  have  been  selected.  After  the  sparse 
structure  is  fixed,  the  model  can  be  written  in  a  matrix  form 


/=4>/+  <V5(A  3- 

where  the  operating  point  (the  cluster  centre)  vector  <j> f  and  the  matrix  containing  the  feature  vectors 
are  somehow,  determined  by  the  current  measurement  data  vector f.  The  "generalized  state"  vector  %(/)  can 
be  determined  in  the  least  squares  sense,  as: 


W)=(®TfwQ>fyx<bTfw-{f-$f) 


4. 


184 


Above,  the  diagonal  weighting  matrix  W  is  essential:  the  key  to  generality.  Formally,  no  output  has  been 
defined  in  Equations  1  to  4;  all  signals,  no  matter  what  their  role,  are  coded  in  the  data  vector/  When  the 
model  is  used  for  prediction  purposes,  for  example,  the  nature  of  the  vector  elements  becomes  clear:  the 
output  signals  are  unknown,  whereas  the  inputs  can  be  measured.  Setting  the  entries  in  W  that  correspond  to 
the  unknown  variables  to  zero,  they  are  not  used  for  matching  measurements  against  the  model.  The 
reconstructed/  calculated  from  Equation  3,  reveals  the  "best  guesses"  for  the  unknowns;  so  that,  associative 
regression  can  readily  be  implemented  using  the  general  model  framework.  The  generality  of  the  model 
formulation  makes  it  possible  to  apply  it  in  very  different  applications  -  for  example,  pattern  recognition 
can  be  implemented,  and  associative  reasoning  systems  have  also  been  experimented  with.  Technical 
applications  include,  for  example,  feature  extraction  in  a  visual  images  of  a  flotation  process  [8]. 

A  straightforward  implementation  of  the  above  view  is  an  algorithm  called  GGHA  (  Generalised  GHA). 
This  algorithm  can  be  interpreted  as  an  extension  of  PCA,  being  based  on  self-organisation  and  explicit 
sparse  coding  of  the  data  [9,10].  The  sparsity  property  makes  it  also  possible  to  capture  independent 
components  (having  positive  kurtosis).  The  models  in  the  examples  of  the  previous  section  were  calculated 
using  this  algorithm  and  the  parameters  nh  n2,  and  N  were  explicitly  given. 

So,  finally,  how  well  have  we  reached  our  objectives?  The  first  class  of  constraints,  those  related  to  the 
ability  to  represent  the  system  structure,  are  all  rather  nicely  satisfied  (this  is  natural  since  these  objectives 
were  explicitly  taken  as  targets).  Being  based  on  linear  constructs,  the  model  structure  can  also  meet  the 
needs  "from  above"  rather  well.  One  of  the  weakest  points  is  the  lack  of  robustness;  the  model  does  not 
always  converge  in  the  same  way.  Actually,  this  cannot  be  avoided  since  the  distinction  between  a  cluster 
and  a  feature  is  not  exact  (this  phenomenon  is  fundamental:  when  it  comes  to  mental  categories,  we  have 
noticed  that  the  boundaries  between  classes  are  not  clear-cut,  but,  rather,  determined  by  relevance).  For  this 
reason  the  algorithms  implementing  the  proposed  view  must  be  iterative  and  this  may  cause  efficiency 
problems  in  practice. 


REFERENCES 

1.  A.  J.  Niemi,  R.  Ylinen,  H.  Hyotyniemi,  1997.  On  characterization  of  pulp  and  froth  in  cells  of  flotation 
plant.  Int.  J.  of  Mineral  Processing,  51,  51-65. 

2.  A.  Basilevsky,  1994.  Statistical  Factor  Analysis  and  Related  Methods.  John  Wiley  &  Sons,  New  York. 

3.  T.-W.  Lee,  1998.  Independent  Component  Analysis  -  Theory  and  Applications.  Kluwer  Academic 
Publishers,  Boston. 

4.  T.  Kohonen,  1995.  Self-Organizing  Maps.  Springer-Verlag,  Berlin. 

5.  S.  Haykin,  1994.  Neural  Networks  -  Comprehensive  Foundation.  Macmillan  College  Publishing,  NY. 

6.  L.-X.  Wang,  1997.  A  Course  in  Fuzzy  Systems  and  Control.  Prentice-Hall  International,  London. 

7.  E.  Saund,  1995.  Multiple  cause  mixture  model  for  unsupervised  learning.  Neural  Computation,  7,  51-71. 

8.  A.  J.  Niemi,  H.  Hyotyniemi,  R.  Ylinen,  1999.  Image  analysis  and  vision  systems  for  processing  plants. 
Proc.  2nd  Inti.  Conf.  on  Intelligent  Processing  and  Manufacturing  of  Materials,  IPMM99,  Honolulu,  HI. 

9.  H.  Hyotyniemi,  1998.  Automatic  Structuring  of  Unknown  Dynamic  Systems.  In  Soft  Computing  in 
Engineering  Design  and  Manufacturing  (P.K.  Chawdhry,  R.  Roy,  and  R.K.  Pant,  eds.),  Springer-Verlag, 
London,  410-419.  available  at  http://Saato014.hut.fi/Hvotvniemi/publications/97  wsc2.htm. 

10.  H.  Hyotyniemi,  1998.  Structure  from  Data:  AI  Approaches  to  Systems  Modeling.  Proc.  8th  Finnish  AI 
Conference  STeP'98  (P.  Koikkalainen  and  S.  Puuronen,  eds.),  Finnish  Artificial  Intelligence  Society 
(FAIS),  Helsinki,  3 1—40.  available  at  http://Saato014.hut.fi/Hvofsmiemi/publications/98  step  l.htm. 


185 


A  Fuzzy  Design  Evaluation  Based 
on  Taguchi  Quality  Approach 

A.  Donnarumma  *,  N.  Cappetti*,  M.  Pappalardo  *,  E.  Santoro  ** 

*  Dipartimento  Ingegneria  Meccanica ,  Universita  di  Salerno,  Italy 
**  Dipartim.  di  Progettazione  e  Gestione  Industriale,  Universita  di  Napoli  Fed.  II,  Italy 


ABSTRACT 

A  fuzzy  method  to  handle  vagueness  and  imprecision  in  the  description  of  requirements  for  multi-attribute 
decision  making  (MADM)  problem  is  presented.  This  method  is  applied  to  design  of  an  apron  conveyors  to 
collect  and  transfer  scraps.  The  aggregation  function  for  the  overall  evaluation  is  obtained  utilising  the 
Taguchi  loss  functions. 


INTRODUCTION 

The  design  process  is  essentially  a  decision  process  where  a  choice  between  different  design  alternatives 
must  be  performed  on  the  basis  of  evaluations  obtained  analysing  assigned  characteristics,  requirement, 
performances,  etc. 

In  most  design  situation  the  description  of  some  requirements,  etc.  is  imprecise,  vague  or  expressed  in  terms 
of  linguistic  concepts  (expensive,  heavy,  reliable,  etc.),  therefore  under  such  conditions  it  is  convenient 
utilise  the  fuzzy  sets  for  representing  and  manipulating  such  imprecision  and  vagueness  [1,2,3]. 

Two  important  problems  involve  the  fuzzy  sets  in  the  design  process:  the  choice  of  an  aggregation  function 
for  computing  the  overall  evaluation  and  the  construction  of  membership  functions  (m.f.)  for  each  imprecise 
requirement  [4]. 

The  above  problem  are  strongly  related  since  both  require  the  designer^  experience  and  expert’judgement 
and  their  solution  are  high  subjective. 

In  fact,  the  construction  of  membership  functions  is  based  on  the  expert’judgement  and  may  be  classified 
into  direct  and  indirect  methods  [5],  When  the  designer  assigns  a  membership  value  equal  to  one  that  means 
that  there  is  a  full  degree  of  satisfaction  of  the  requirement.  Therefore  considering  the  value  one  as  target 
value,  a  deviation  from  a  such  value  may  be  interpreted  as  a  reduction  of  the  design  product  quality.  This 
basic  idea  will  be  utilised  in  this  paper  for  defining  an  aggregation  function  which  combining  the  deviations 
of  each  requirement  will  hold  to  compute  an  overall  evaluation  of  each  design  alternatives. 

This  approach  is  not  new  since  also  Taguchi  proposed  a  similar  approach  [6]  introducing  some  very  simple 
concepts,  for  quality  control  in  the  manufacturing  industries,  which  should  guide  the  designer  along  the 
design  process. 

He  emphasised  that  quality  is  directly  related  to  deviation  of  given  design  parameter  from  target  or 
desirable  values  and  developed  more  then  68  loss  functions.  Besides,  he  believed  that  the  target  value  can 
be  elicited  only  utilising  designer  experience  and  preference,  expert’judgement,  etc.. 

The  Taguchi  Toss  functions”  dont  regard  only  the  design  parameters  but  also  any  characteristics, 
requirements  that  contribute  to  the  designer  conception  of  quality,  since  the  product  quality  begins  to 
gradually  deteriorate  as  their  effective  parameter  and  requirement  values  deviate  from  the  corresponding 
desirable  (target)  values. 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


186 


In  general  many  situations  are  described  by  the  following  quadratic  loss  function  [7]  which  is  called  the 
nominal-the  best 


LR(x)  =  k{Y-YQf 


(1) 


where  Y  is  the  quality  characteristic,  Y0  is  the  corresponding  target  value  and  k  a  quality  loss  coefficient. 
Other  common  loss  functions  are  the  smaller-the  better  and  larger-the  better. . 

Substituting  to  Y0  and  Y  respectively  1  and  the  membership  value  and  for  k=l ,  Eq.  (1)  becomes 

L*(*)  =  [l-Mx)]2  (2) 

which  is  a  fuzzy  complement  operation,  in  fact  LR(x)  can  be  interpreted  as  the  degree  to  which  x  does  not 
belong  to  R. 

In  general  the  design  process  involves  many  requirements R;  i=l,..,m  having  unequal  preference,  therefore 
can  be  useful  associate  to  each  requirement  Ri  a  weight  Wj€[0,l].  In  a  such  condition  an  aggregation 
function  based  on  the  Eq.  (2)  for  compute  the  overall  evaluation  is 


P(x)  = 


X  W!LRi(X)  _  X  Wi  t1  ~  (*)f 

X"/  X«* 


(3) 


The  above  function  can  be  very  useful  for  solving  a  fuzzy  multi-attribute  decision  making  (MADM) 
problem,  since  the  design  alternative  with  the  lowest  overall  value  will  be  the  most  ‘tefficient”  solution. 


FUZZY  PROPERTIES  OF  LOSS  FUNCTION 

A  fuzzy  complement  of  fuzzy  set  A  with  respect  to  the  universe  set  X  is  defined  by  the  function 

<?(M*)):[0,l]->[0,l]  (4) 

For  any  xe  X  the  c(p A  (x))  value  may  be  interpreted  as  the  degree  to  which  x  does  not  belong  to  X.  An 
example  of  fuzzy  complement  is  the  well-known  standard  fuzzy  complement  function 
c(R,i(x))=l  ~  •  Obviously  function  (4)  is  independent  of  X  and  to  produce  a  meaningful 

complements  of  given  fuzzy  set,  A  must  possess  at  least  the  following  boundary  and  monotonicity 
properties 

c(M -a  (*)  =  6  =  0  c(nA  (x)  =  0)  =  1 

V  c(nA(Xj  )>c(  M*,) 

An  element  of  X  for  which  c(\iA  (x))  =  p^(x)  is  called  equilibrium  point  and  obviously  for  standard 
complement  function  the  equilibrium  point  is  that  for  which  [iA(x)  =  0.5. 

If  a  complement  function  satisfies  also  the  property  c(c(p/f(x)))  =  u4(x)  is  said  involutive.  An  example  of 
involutive  fuzzy  complement  is  the  Sugeno  function 

cx(r(x))  =  7~7~~  for  A,  e  [-1,  °°[ 

1  +  Afl(x) 

We  observe  that  the  quadratic  loss  function  (2)  is  a  fuzzy  complement  function  which  satisfies  the  boundary 
and  monotonicity  properties,  but  it  is  not  involutive  and  besides  its  equilibrium  point  has  p/(  (x)  =  0.382. 


187 


In  Fig.  1  is  shown  a  graphical  representation  of  a  triangular  membership  function  and  the  corresponding 
fuzzy  loss  function. 


Fig.  1.  Triangular  membership  function  and  corresponding  loss  function 


APPLICATION 

A  metal  stamping  industry  need  some  apron  conveyors  to  collect  and  transfer  scraps  [8].  Three  alternative 
systems  with  different  form  of  conveyor  plates  are  analysed.  The  systems  are: 

X)  -  a  modular  stamped  plates  conveyor.  Two  borders  of  the  plates  are  bend  to  obtain  pivot  seats  for 
joining  plates  and  the  chains.  Lateral  plates  are  welded  to  prevent  scrap  loss. 

x2  -  a  modular  plates  conveyor  with  welded  tubes.  The  same  of  X]  but  pivot  seats  are  welded  tubes. 

x3  -  a  modular  stamped  plates  conveyor.  The  plates  are  not  joined  between  themselves  but  one  is 
leaned  on  the  following.  Lateral  plates  are  fixed  with  screws  to  prevent  scrap  loss.  The  same 
screws  join  plates  to  chains. 

These  systems  are  different  not  only  for  plate  typology,  but  also  for  processing,  materials  and  geometric 
characteristic.  However  all  systems  must  work  properly  and  satisfy,  at  least  a  bit,  some  functional 
requirements.  Defining  the  requirements  by  fuzzy  sets,  the  problem  is  to  assign  membership  values  to  each 
requirement  R;.  To  achieve  this,  some  industry  experts  were  carefully  selected  and  for  each  requirement  the 
following  membership  functions  were  suggested: 

R,  -Investment  costs:  this  represents  the  economic  value  of  a  process  and  the  assemble  of  conveyor 
equipment.  For  costs  within  [0-50M]  the  m.f.  is  linear  decreasing  in  [1,0.5],  where  50M  is  the 
maximum  value. 

r2  -Working  costs:  this  represents  economic  operating  cost  of  conveyor  process  and  assembling.  The 
membership  function  is  defined  by  the  an  indirect  method  with  one  expert. 

R3  -  Simplicity  of  maintenance:  the  simplicity  and  speed  of  maintenance  operations  are  important  for  this 
requirement.  The  m.f.  is  M(At,  A;)  =  p,  MAt  +  p;  MAi,  where  MAt  and  MAi  are,  respectively,  the  time  to 
operation  score  and  the  time  of  operation  score,  while  p;_  and  pt  are  corresponding  weights.  The  m.f.  is 
expressed  by  an  expert  and  shown  in  Table  1  ( indirect  method  with  one  expert). 

R,  -Loss  of  Scraps:  the  main  reason  for  scrap  loss  is  the  nominal  gap  G  between  plates  since  a  minimal 
thickness  of  produced  scrap  is  0.8mm  and  the  tolerance  for  all  systems  is  ±0.2.  The  m.f.  is  1  for  G  < 
0.6,  and  decreases  to  0  for  G  =  [0.6,  1 .0]  and  is  0  for  G  >  1 . 

R5  -Reliability:  this  m.f.  is  defined  by  an  expert  and  shown  on  Tab.  2  for  attribute  =  reliable. 

R6  -  Assembly  simplicity:  the  m.f.  is  defined  by  an  expert  and  shown  on  Tab.  2  for  attribute  =  simple.  This 
requirement  represents  the  simplicity  of  assembling  operations. 

R7  -Availability  of  materials:  the  m.f.  is  a  linear-decreasing  function  with  respect  to  unavailable 
components  since  time  and  cost  of  realisation  are  reduced  when  materials  to  construct  the  system  are 
already  available. 


188 


Table.  1.  Maintenance  scores  MAt  and  MAi  Table.2.  Scores  for  Rs  and  R6 


MAt 

MAi 

Iks 

30  days 

7  minutes 

1 

25  days 

30  minutes 

0.9 

20  days 

1  hour 

0.8 

1 5  days 

2  hours 

0.7 

10  days 

5  hours 

0.6 

7  days 

8  hours 

0.5 

5  days 

1  day 

0.4 

2  days 

2  days 

0.3 

1  day 

3  days 

0.2 

Few  hours 

5  days 

0.1 

Immediately 

7  days 

0 

h5.f1 

Very  attribute 

1 

More  than  attr.te 

0.9 

Normally  attr.te 

0.8 

Enough  attr.te 

0.7 

Almost  attr.te 

0.6 

Little  attr.te 

0.5 

Not  almost  attr.te 

0.4 

Not  enough  attr.te 

0.3 

Not  attr.te 

0.2 

Hardly  ever  attr.te 

0.1 

Very  not  attr.te 

0 

Rg -Weight  of  plates:  the  m.f.  is  defined  by  indirect  method  with  one  expert,  it  is  a  decreasing  function 
versus  weight.  When  the  weight  of  plates  is  high  we  have  a  big  and  heavy  supporting  structure,  a  high 
current  drain  of  motors  and  consequently  an  increasing  of  costs. 

R9 -Safety:  it  represents  the  behaviour  of  the  system  when  an  unforeseen  event  occur  i.e.  chain  block 
because  of  scrap  locking  or  accidental  load.  The  m.f.  is  S(t,  p)  =  pt  St  +  pp  Sp,  where  St  and  Sp  are 
traction  and  vertical  loads  scores,  and  Pt  e  pp  the  corresponding  weights. 


Table  3.  Sp  and  S,  scores. 


No  deformations 

1 

Elastic  deformations 

0.8 

Plastic  deformations 

0.6 

Probably  dangerous  deformation 

0.4 

Dangerous  deformation 

0.2 

Fault 

0 

Examination  of  each  design  alternative  by  an  expert  allows  the  assignment  of  a  membership  value  to  each 
requirement.  Moreover  the  experts  establish  the  relative  importance  of  each  requirement  assigning  a  weight 
in  the  global  evaluation.  Tab.  4  shows  the  membership  values  and  the  weights  Wj.  for  the  three  alternatives. 


Table4.  Membership  values  &  weights. 


Xi 

X2 

x3 

W; 

R. 

0.81 

0.92 

0.91 

0.70 

r2 

0.47 

0.37 

1.00 

0.80 

HI 

0.68 

0.70 

0.86 

1.00 

R4 

1.00 

0.75 

0.75 

0.75 

Rs 

0.80 

0.80 

0.60 

1.00 

R* 

0.50 

0.80 

0.80 

0.60 

r7 

0.50 

0.50 

0.70 

0.40 

r8 

0.55 

0.68 

1.00 

0.90 

R9 

0.52 

0.72 

0.56 

1.00 

Values  shown  in  Table  4  are  composed  through  Eq.  (3)  for  computing  the  overall  evaluation  of  the  design 
alternative  Xj  to  the  industry  target.  The  results  of  first  row  of  Table  5  indicate  that  x3  has  the  minimum 
value,  therefore  it  is  better  than  Xi  and  X2.  Obviously,  the  choice  could  be  influenced  by  assigned  weights 
and  corresponding  requirements.  For  verifying  that  in  the  other  rows  are  calculated  the  overall  evaluation 
when  the  weight  of  one  requirement  is  posed  to  0,  that  means  ignore  the  corresponding  requirement.  For 
this  application  we  note  that  the  design  alternative  x3  is  always  better  than  X]  and  x2. 


189 


In  the  fifth  column  of  Table  5  the  fuzziness  index  of  set  X=  {xj,  x2,  x3}  is  shown  where  0  indicate  the 
certainty  of  choice  and  1  is  total  indecision.  The  index  allows  evaluation  of  the  goodness  of  problem 
definition.  The  requirement  that  yields  much  more  indecision  is  R5  because  Is(w5  =  0)  =  0.78,  i.e.,  if  we  do 
not  consider  R5,  the  fuzziness  decreases  whereas  for  Rg,  the  fuzziness  index  increases  to  IS(R8  =0)  =  0.87. 


Table  5.  Scores  and  influence  of  every  req 

uirement. 

*1 

X2 

x3 

L 

ESI 

0.15 

0.11 

0.07 

0.83 

0.16 

0.12 

0.07 

0.83 

0.13 

0.07 

0.08 

0.82 

0.15 

0.11 

0.08 

0.84 

for  w_t=0 

0.16 

0.12 

0.07 

0.81 

for  Ws=0 

0.17 

0.12 

0.05 

0.78 

IBBEil 

0.14 

0.12 

0.07 

0.86 

0.14 

0.10 

0.07 

0.83 

BESEEM 

0.14 

0.11 

0.08 

0.87 

0.13 

0.12 

0.05 

0.82 

Table  6  shows  the  domination  degree  matrix  [dij]  which  indicates  haw  much  a  design  alternative  X;  is 
better/worse  than  Xj .  For  dij  >  0.5,  we  know  that  alternative  x,  is  better  than  Xj,  vice  versa  for  dij<  0.5. 


Table  6.  Dominance  degree  matrix. 


ri 

r2 

r3 

ri 

0.50 

0.57 

0.68 

r2 

0.43 

0.50 

0.62 

r3 

0.32 

0.38 

0.50 

CONCLUSION 

The  decision  between  available  design  alternatives  presupposes  an  overall  evaluation  of  the  different 
alternatives  with  respect  to  assigned  requirements.  For  vague  and  subjective  requirements  the  fuzzy 
approach  is  very  useful  for  representing  and  manipulating  such  requirements.  But  in  this  approach  an 
expert's  knowledge  is  particularly  important  either  to  obtain  them.f.  for  each  requirement  or  for  selecting  an 
aggregation  function  to  compute  the  overall  evaluation.  This  paper  has  demonstrated  the  usefulness  of  the 
fuzzy  approach  for  solving  a  practical  design  problem  utilising  an  aggregation  function  which  is  derived 
from  Taguchi  philosophy.  As  well  we  note  that  the  presented  application  is  a  real  case  of  choice  involved 
by  an  Italian  mechanical  company. 


REFERENCES 

1.  J.D.  Jones,  Y.  Hua,  1998.  A  Fuzzy  Knowledge  base  to  support  routine  engineering  design.  Fuzzy  sets 
and  System,  Nr.  98,  267-278. 

2.  E.  K.  Antonsson,  K.N.  Otto,  1995.  Imprecision  in  Engineering  Design.  ASME  Jour,  of  Mechanical 
Design,  Vol.  117,  S.25-31. 

3.  N.  Cappetti,  E.  Santoro,  1998.  An  Application  of  Computer  Visualisation  for  Solving  a  Mechanical 
Design  by  Fuzzy  Set.  IEEE  Inter.  Conf.  on  Inform.  Visualisation,  London,  S.  60-75 

4.  M.J.  Scott,  E.K.  Antonsson,  1998.  Aggregation  functions  for  engineering  design  trade-off.  Fuzzy  Set  and 
Systems,  Vol.  99,  S.  253-264 

5.  G.J.  Klir,  B.  Yuan,  1995.  Fuzzy  sets  and  Fuzzy  Logic.  Prentice  Hall. 

6.  R.  Ranjit,  1990.  A  primer  on  the  Taguchi  Method.  Van  Nostrand  Reinhold  Book 

7.  A.  Donnarumma,  M.  Pappalardo,  A.  Pellegrino,  1998.  Classification  and  information  using  fuzzy  design. 
APE  98,  Warsaw  (Poland). 

8.  N.  Cappetti,  A.  Donnarumma,  E.  Santoro,  1998.  A  fuzzy  approach  to  design  evaluation  of  an  apron 
conveyor  for  a  mechanical  industry.  7th  Int.  Sci.  Conference  "Achiev.  Mech.  &  Mat.  Engineering,  Ed. 
L.A.  Dobrzanski,  Poland,  S.  59-62. 


190 


191 


Non-Traditional  Performance  Analysis 


J.  Arlen  Cooper 

Sandia  National  Laboratories,* 
Albuquerque,  NM  87185-0490,  USA 
Email:  acooper@sandia.gov 


ABSTRACT 

Traditional  performance  analysis  methods  are  applicable  to  many  standard  problems,  including  those 
examples  illustrated  in  most  formal  courses.  However,  there  are  many  real-world  situations  for  which  non- 
traditional  methods  appear  to  be  more  appropriate,  mainly  because  most  practical  problems  involve 
substantial  subjectivity  about  the  inputs  and  models  used.  This  paper  surveys  some  of  the  most  applicable 
approaches  found  in  a  recent  research  study.  Each  approach  is  developed  individually  and  is  illuminated  by 
selecting  example  situations  of  apparent  applicability.  Then,  the  combinational  blending  of  the  approaches 
with  each  other  and  with  traditional  methodology  is  discussed. 

Sandia  is  a  multiprogram  laboratory  operated  by  Sandia  Corporation,  a  Lockheed  Martin  Company,  for  the  United 
States  Department  of  Energy  under  Contract  DE-AC04-94AL85000 


INTRODUCTION 

In  most  analyses,  conventional  approaches  are  typically  employed,  even  where  they  may  not  match  the 
problem  parameters  very  well.  For  example,  most  people  frequently  express  their  thoughts 
probabilistically,  even  when  actually  thinking  possibilistically.  To  illustrate,  one  might  be  tempted  to  use  a 
high  school  transcript  showing  half  As  and  half  Bs  for  forecasting  a  "probability"  of  a  student  getting  an  A 
in  a  university  course  as  50%.  This  associates  a  frequentist  concept  with  a  possibilistic  event  (it  is  judged 
equally  possible  that  the  grade  will  or  will  not  be  A,  where  "not  A"  =  {B,C,D,F}  and  there  is  positive 
weight  only  for  B).  There  is  no  direct  evidence  prior  to  receiving  college  course  grades. 

Carrying  this  a  step  further,  it  is  often  necessary  to  associate  an  "extreme"  prognosis  with  an  event  that 
might  cause  an  unexpected  safety  or  reliability  problem.  As  an  analogy,  one  might  forecast  the  probability 
of  the  student  above  getting  five  As  in  five  courses  as  1/32.  In  practice,  there  is  a  strong  possibility  that  this 
perception  is  misleading. 

For  example,  the  student  may  place  high  importance  on  university  grades  and  be  inspired  to  work  very 
hard,  getting  straight  As.  This  might  help  explain  why  historical  examination  of  accident  histories  has 
shown  a  tendency  of  prior  traditional  safety  analyses  to  be  overly  optimistic,  and  why  possibilistic 
mathematical  approaches  are  currently  receiving  considerable  attention.  In  this  paper,  we  consider  a 
number  of  non-traditional  methods,  each  of  which  can  be  used  to  advantage  in  particular  classes  of 
problems.  Subsequently,  we  describe  a  hybrid  technique  that  can  be  applied  to  problems  corresponding  to 
diverse  models 


FUZZY  MATHEMATICS 

Fuzzy  mathematics  is  a  form  of  possibilistic  processing.  The  difference  between  fuzzy  logic  and  fuzzy 
mathematics  is  roughly  analogous  to  the  difference  between  Boolean  algebra  and  the  algebra  of  real 
numbers.  Like  probabilistic  calculus,  Fuzzy  mathematics  also  can  be  applied  to  introduce  variability  to 
fixed  parameters.  For  example,  a  subjective  parameter  can  have  some  of  the  characteristics  of  more  than 
one  number  (e.g.,  "approximately"  five  may  indicate  a  range  of  real  numbers  including,  but  not  limited  to, 
five).  Fuzzy  models  can  therefore  be  applied  to  describe  parameters  in  analyses  (e.g.,  probability  analysis), 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


192 


and  this  has  some  similarity  to  strictly  conventional  descriptions.  However,  fuzzy  algebra  differs  from 
conventional  mathematics  both  formally  and  in  concept. 

Since  fuzzy  mathematics  does  not  assume  the  precise  relationships  inherent  in  probability  distributions,  it 
appears  to  be  more  appropriate  for  the  subjective  inputs  applicable  to  vaguely  defined  environments. 

A  fuzzy  number  (formally  a  convex  and  normal  fuzzy  set)  can  be  represented  mathematically  [1]  as: 

Au(x)  =  Ap=[al(\L),a2(\L)]  1. 

where  the  ax  and  a2  values  on  x  represent  the  lower  and  upper  limits,  respectively,  of  the  variation 
possible  for  the  parameter  as  a  function  of  (1 ,  and  Jit  is  a  "level  of  presumption."  The  level  of 

presumption  represents  a  collection  of  subjective  judgments^  about  the  range  specified. 

One  must  be  more  presumptuous  in  order  to  specify  monotonically  decreasing  variable  ranges  (maximum 
level  of  presumption  is  presumption  of  minimum  uncertainty). 

This  is  associated  with  the  convexity  property.  The  "normal"  restriction  fixes  the  maximum  level  of 
presumption  at  1  and  the  minimum  to  0.  If  a  particular  value  of  presumption,  a,  is  selected,  a  horizontal 
line  can  be  drawn  that  intersects  the  ordinate  at  |I  =  a.  The  two  points  where  the  line  intersects  the  function 

represent  the  lower  and  upper  bounds  for  the  parameter  at  the  specified  presumption  level.  No  information 
is  implied  within  the  limits. 

In  contrast  to  fuzzy  logic,  fuzzy  mathematics  treats  operands  as  fuzzy  (subjectively  known)  numbers  with 
uncertainty  along  the  abscissa,  and  computes  in  terms  of  abscissa  values  rather  than  ordinate  values.  The 
mathematical  basis  for  combining  fuzzy  numbers  is  based  on  Zadeh’s  Extension  Principle  [2].  For  addition 
and  multiplication  (basic  to  fault  tree  and  event  tree  computations),  this  produces: 

^+s00  =  v  (M*)aMt))  2- 

z=x+y 

Va*b(z)=  V  (M*)aMt))  3- 

z=xxy 

These  are  convolutions  basically  constructed  like  those  used  in  probabilistic  calculus,  but  without  the 
"independent"  abscissa  valuations.  This  is  because  the  above  fuzzy-algebra  operations  utilize  only  ranges  of 
values,  and  make  no  use  of  or  assumptions  about  relationships  between  probability  parameters,  or  of 

independence  between  probability  parameters.2  The  operations  shown  above  are  directly  useful  for 
parameters  for  which  relative  probabilities  and  independence  are  not  well  known  (a  common  situation).  On 
the  other  hand,  probabilistic  operations  are  limited  to  parameters  for  which  these  characteristics  are  well 
known  (a  less  common  situation). 


MODIFIED  DEMPSTER-SHAFER  THEORY 

Another  concept  under  examination  is  an  "enhanced"  Dempster-Shafer  theory.  Dempster-Shafer  theory 
represents  both  frequentist  and  Bayesian  concepts,  separating  supportive  evidence  and  contrary  evidence 
for  particular  situations.  The  supplement  we  are  pursuing  is  to  represent  the  amount  of  subjectivity  in  the 
two  components,  which  allows  decision-makers  (looking  at  analysis  results)  to  track  more  precisely  how 
much  subjectivity  was  included.  An  illustration  of  the  use  of  this  technique  is  shown  in  Figure  1. 

Here,  we  postulate  the  projection  of  a  final  "average"  based  on  partial  data.  It  is  further  postulated  that  the 
supporting  evidence  and  contrary  evidence  are  partially  applicable,  so  that  with  a  higher  degree  of 


'Preferably  from  "experts,"  preferably  based  on  data  (even  if  limited),  and  possibly  weighted  according  to  expertise 
2However,  treatment  of  independence/dependence  properties  is  not  precluded  [3]. 


193 


subjectivity,  tighter  bounds  are  derived.  This  has  the  value  of  not  only  indicating  the  bounds  supported  by 
the  evidence,  but  also  indicating  how  much  subjectivity  was  used  in  applying  the  bounds. 


Subjectivity 

Supporting  Evidence 


Contrary  Evidence 


Current  Average 
_ 1 _ 


0.300  0.350  0.400 

Fig.  1.  Subjective/Dempster-Shafer  Model. 


SOFT  MATHEMATICAL  AGGREGATION 

Soft  mathematical  aggregation  is  useful  in  a  significant  number  of  applications.  Inputs  may  contribute  to 
the  output  without  being  related  to  linear,  Boolean,  or  possibilistic  mathematics.  For  example,  a  production 
line  employee  who  is  disgruntled  or  unmotivated,  or  a  training  program  that  is  not  done  skillfully  might  not 
directly  contribute  to  an  accident,  but  the  presence  of  such  situations  projects  safety  concerns  and  potential 
contributors  to  an  accident,  if  other  unfavorable  events  occur. 


As  another  example,  a  medical  doctor  may  accumulate  weighted  health  information  combined  non-linearly 
(weight/height,  blood  pressure,  temperature,  pulse  rate,  blood  test  parameters,  reflexes).  Safety  indicators 
are  similar.  The  potential  effectiveness  of  protective  measures  (e.g.,  medicine)  is  also  weighed.  In  these  and 
similar  applications,  there  is  a  particular  need  for  contributions  that  push  toward  a  limit  (e.g.,  "unsafe"  or 
"safe")  without  ever  being  assured  of  reaching  the  limit.  Our  model  for  these  situations  is  exponential,  as 
shown  in  Figure  2.  Safety  protective  measures  are  aggregated  up  the  ordinate  and  threats  are  aggregated 
down  the  ordinate.  The  abscissa  indicates  a  weighted  "rating"  function  that  is  subjectively  obtained  and 
based  on  expert  judgment.  The  equation  used  is: 


/  = 


-X  *“’<*/ 

l-e 


-'Zkvjyj 

e 


4. 


The  w;  and  Vj  indicate  "weights"  on  the  importance  of  the  protective  measure  and  threat  aggregates, 

n  m 

respectively  (m  in  number).  The  weights  are  normalized  so  that  ^  W  X,  =1  and.  V j  y j  =  1 .  The  Xj 

/=!  '  y=i 

and  yj  are  ratings  of  how  bad  the  hazards  are  on  a  scale  of  0  to  1 .  The  constant  k  is  a  variable  dependent  on 
the  number  of  aggregate  constituents.  Figure  2  shows  an  example  aggregation  of  threats  and  controls.  The 
aggregation  can  be  carried  out  with  the  parameters  combined  in  any  order;  or  the  aggregation  can  be  carried 
out  for  the  entire  system. 

Figure  2  shows  growth  of  the  aggregate  attribute  with  three  contributing  controls  and  three  types  of 
concerns.  Also  shown  is  a  "threshold  of  concern",  which  is  conceptually  a  fuzzy  threshold,  and  below 
which  system  concern  becomes  greater. 


MULTI-PARAMETER  DECISION  ANALYSIS 

Multi-parameter  decision  analysis  is  also  an  important  tool,  because  system  designers  choose  components 
under  cost,  performance,  size,  safety,  and  reliability  constraints.  System  usage,  modification,  and  control 
are  determined  by  multiple  parameters.  System  analysts  also  weigh  all  of  the  above  factors  and  select 
mathematical  models.  The  high  degree  of  subjectivity  and  the  mathematical  constraints  involved  have 
made  this  a  target  for  subjective  mathematical  tool  development  [4],  A  few  of  the  many  techniques  that  are 
commonly  used  are:  weighted  sum,  threshold  logic,  propositional  logic,  and  ordered  weighted  averaging. 


194 


The  process  we  have  developed  is  a  weighted  sum  that  can  be  either  linear  or  softly  aggregated.  The 
method  also  includes  uncertainty  characteristics.  An  example  is  shown  below  in  Table  1.  The  scenario 
shown  demonstrates  how  two  different  viewpoints  of  the  same  data  set  can  yield  different  conclusions. 


Threat/control 


HYBRID  COMBINATION 

The  above  approaches,  numerous  others  under  study,  and  traditional  analyses  could  be  appropriate 
individually  for  a  particular  part  of  a  system  analysis.  However,  it  is  more  likely  that  combinations  might 
generally  be  necessary.  For  this  reason,  a  hybrid  mathematical  structure  has  been  developed  that  allows  the 
models  (traditional  and  non-traditional)  to  be  combined  while  tracking  the  amount  of  subjectivity  involved 
and  portraying  its  source.  Our  solution  treats  each  part  of  a  problem  as  a  subsystem,  and  each  subsystem 
can  have  an  objective  and  a  subjective  constituent.  Initially,  each  subsystem  is  addressed  separately.  Then, 
we  then  carry  along  the  analysis  to  combine  subsystems  by  processing  objective  and  subjective  portions 
separately  to  derive  a  two-part  result.  The  solutions  are  currently  being  incorporated  into  software  tools  [4]. 


Table  1.  Combining  Parameter  Importance  and  Quality  to  Reach  an  Aggregate  Decision. 


Example  factors 

safety  weighting 

system  weighting 

attribute  score 

<  Safety  contribution 

0.3-0. 4 

0.1-0.2 

0.6-0.9 

<  bypass  potential 

0.3-0. 4 

0.1-0.2 

0.5-0.8 

<  synergism  with 
other  features 

0.03-0.07 

0.1-0.2 

0.4-0.6 

<  lifetime  stability 

0. 1-0.2 

0.05-0.15 

0.2-0.5 

<  produceability 

0.01-0.03 

0.05-0.15 

0.9-1. 0 

<  reliability 

0.01-0.03 

0.05-0.15 

0.9-1. 0 

<  size 

0.01-0.03 

0.05-0.15 

0.9-1.0 

<  cost 

0.05-0.15 

0.05-0.15 

0.8-1. 0 

<  operability 

0.05-0.15 

Safety  w  eighted 

0.05-0.15 

score:  0.62-0.73 

0.8-1. 0 

Systems  weighted  score:  0.74-0.85 

REFERENCES 

1.  A.  Kaufmann  and  M.  Gupta,  1991.  Introduction  to  Fuzzy  Arithmetic,  Van  Nostrand  Reinhold,  NY,  NY. 

2.  L.  Zadeh,  1965.  Fuzzy  Sets,  Information  Control,  338-353. 

3.  J.  Cooper,  1994.  Fuzzy-Algebra  Uncertainty  Analysis,  Journal  of  Intelligent  &  Fuzzy  Systems,  2(4). 

4.  J.  Cooper  and  T.  Ross,  1997.  An  Investigation  of  New  Mathematical  Structures  for  Safety  Analysis, 
Sandia  National  Laboratories  Report  SAND97-2695,  November. 


195 


Methods  to  Create  Membership  Functions 
in  Fuzzy-Rules  in  Knowledge-based  Systems 

Cezary  Orlowski 

Faculty  of  Management  and  Economics 
Technical  University  of  Gdansk 
Ul.  Narutowicza  1 1/12,  80-952  Gdansk,  Poland 
+  48  58  347  24  55  cor@sunrise.pg.gda.pl. 


ABSTRACT 

The  paper  aims  to  present  the  author’s  experiences  in  building  complex  systems  with  knowledge  bases.  The 
systems  were  worked  out  using  the  author’s  own  method  based  on  concurrent  engineering  philosophies 
with  elements  of  fuzzy  logic  in  the  net  environment. 

The  accepted  method  ensures  the  following: 

•  ease  of  creation  and  verification  of  membership  functions  in  situations  where  engineers  co-operate 
with  experts. 

•  efficiency  of  creating  the  rules  in  concurrent  engineering. 

•  reduced  number  of  errors  in  testing  the  discussions  path  in  the  graphic  work  environment. 

The  examples  of  domains  and  ways  of  creating  the  system  with  the  special  emphasis  on  the  membership 
function  are  presented  in  the  paper. 


INTRODUCTION 

Computer-aided  decisions  in  systems  with  knowledge  bases  require  the  application  of  techniques  enabling 
to  employ  implementation  of  the  decisions  presented  in  form  of  a  semantic  network  and  a  natural  language 
in  terms  of  multi-valued  logic. 

It  has  been  assumed  that  the  description  of  the  rules  will  be  based  on  linguistic  models  applied  byZadech 
[8]  and  modified  later  by  Tong  and  Gupta  [3].  To  elaborate  the  method  the  linguistic  models  were 
constructed  by  use  of  rules  of  the  knowledge  bases  in  form  of  IF-THEN,  which  were  consequently  formed 
on  the  basis  of  semantic  networks.  There  has  been  proposed  a  linguistic  model  making  use  of  inference  of 
Madami  type:  IF  U  is  B, ,  THEN  V  is  D,  where  fuzzy  relation  R,  is  based  on  the  product  of  fuzzy  sets  Bj 
and  Dj . 


PROPOSED  SOLUTION 

There  has  been  elaborated  a  solution  of  qualitative  type  taking  advantage  of  the  phenomenological 
approach  to  solve  the  problem  of  constructing  the  membership  function  for  projecting  the  hybrid  systems. 

The  induction  approach,  which  has  been  defined  before,  is  taken  into  consideration.  An  attempt  has  been 
made  to  apply  the  deduction  method  utilising  the  Mesarovic  approach  [4],  It  has  been  proved  that  the 
acquired  experience  was  in  favour  of  the  Weinberg  approach  [7]  suggesting  the  researcher’s  intuition  as  a 
cognitive  tool  helping  to  work  out  the  method.  Making  use  of  both  the  descending  and  the  ascending 
approach  and  assuming  the  performance  of  the  task  according  to  some  rules  governing  the  creation  of 
complex  systems  it  was  possible  to  define,  on  one  side  the  objective  of  carrying  out  the  system,  and  on  the 
other  to  determine  the  method  for  its  realisation.  Hence  also  to  carry  out  the  task  it  has  been  suggested  to 
employ  a  multi-level  system  assessment  of  the  tasks  and  processes  taking  into  consideration: 

•  separation  of  tasks  necessary  to  attain  the  objective  in  view  of  the  requirements  of  the  system; 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


196 


•  determination  of  the  system  processes  related  to  the  accomplishment  of  some  particular  task; 

•  determination  of  requirements  necessary  for  the  existence  of  the  processes  resulting  from  tasks  leading 
to  the  attainment  of  the  aim  of  the  hybrid  system.  The  aim  of  the  system  is  determined  during  the  phase 
dealing  with  analysis  of  informal  requirements. 

The  work  presents  an  alternative  approach  to  creating  hybrid  systems.  The  method  is  the  result  of  a  search 
for  system  methods  of  building  intelligent  systems  based  on  knowledge  processing.  It  consists  of  the 
following  models  of: 

•  Multiplane  division  of  the  project  cycle  comprising:  project  management ,  projecting  tools,  sources  of 
knowledge  and  assisting  techniques. 

•  Multilevel  division  into  project  ranges. 

•  Decomposition  of  projecting  processes  for  the  sake  of  acquiring  knowledge  using  parallel  solutions  at 
the  knowledge  base  level  and  serial  ones  at  the  level  of  the  processes  feedback  controlled  by  external 
domain  experts  and  their  usage  for  multilevel  assessment. 

•  Management  of  the  process  with  the  specified  scopes  of  project  solutions  and  the  selection  of  solutions 
for  those  scopes  based  on  the  processes. 

•  Integration  of  technical,  institutional  and  organisational  functions  of  the  projecting  team  for  the  model 
of  team  management  defined  as  the  working  team. 

•  Serial  pro  ecting  processes  with  feedback  in  the  parallel  projecting  areas  and  at  the  specified  levels. 

•  Groups  of  products  together  with  the  method  of  defining  them  by  the  projecting  team. 

The  elaboration  of  the  method  has  been  encouraged  by  Rattray definition  [114]  of  the  projecting  cycle 
division  and  its  modification  to  separate  out  the  various  ranges  of  the  investigations  carried  out.  Use  has 
also  been  made  of  theoretical  models  [  80,85,118,119,144]  in  the  elaboration  of  the  method.  The  ranges 
related  to  the  processes  of  constructing  the  system  have  been  defined.  Range  1  is  related  to  transformation 
of  knowledge  by  synthesis  and  analysis,  whereas  Range  II  refers  to  the  projecting  processes.  The  synthesis 
and  analysis  processes  as  well  as  the  projecting  processes  occur  on  four  parallel  planes  (Fig.  1). 


FUNCTIONS  RANGE  I  ATTRIBUTES  RANGE  II  PRODUCTS 


Fig.l.  Model  of  division  of  projecting  cycle  into  parallel  planes  and  design  ranges 

[own  elaboration] 


197 


RANGE  I 

Range  I  includes  synthesis  and  analysis  processes  using  the  denotations  expressed  by  equations  1  and  2 

Synthesis  =  (functions_  of  the  system)  1 . 

Analysis  =  (attributes_  of  the  system)  2. 


The  processes  of  synthesis  and  analysis  include  events  connected  with  transformation  of  expert!? 
knowledge  into  formulated  expression.  It  is  suggested  to  apply  knowledge  engineers  to  the  control  process 
of  the  knowledge  transformation  process.  Fig.  2  presents  a  model  of  knowledge  acquisition,  where: 


G  Gi 


Fig.  2.  Model  of  knowledge  acquisition  during  formalisation  processes  [own  elaboration] 

Where  G  -  group  of  branch  experts, 

Gi  -  group  of  knowledge  engineers, 

Mi  -  serial  process  of  transferring  knowledge  between  an  expert  and  a  knowledge  engineer, 

M  -  process  of  transferring  formalised  knowledge, 

Ue  -  input  data, 

Y  -  output  data, 

Pi  -  Pn  -projecting  processes 

Acquisition  of  knowledge  is  defined  as  set  D  made  up  of  the  following  elements: 

D  =  <  M,  Mh  Ue,  Y,  P,  G,  Gi  >  3. 

The  process  of  acquiring  knowledge  P  is  defined  as: 

P  :  (M  +  M,)  x  Ue  — >Y  4. 

and  depends  directly  on  value  M+Mi,  i.e.,  on  the  way  it  is  transformed  by  knowledge  experts  and 
engineers. 


RANGE  II 

The  projecting  processes  are  defined  as  serial  processes  involving  feedback  occurring  in  parallel  design 
areas; 


Projecting  processes  =  ( Pu . Pmn)  5. 

The  matrix  of  the  serial  processes  (13)  comprises  processes  taking  place  independent  of  Pn . Pnm  in  areas 

O, . Om  defined  by  vector  O.  Products  pn . pm  are  the  result  of  these  processes.  A  parallel  transformation 

process  is  illustrated  in  Fig.3. 


198 


O \ 

pr[ 

Pi  1 

Pi  2 

Pi  3 

-  Pin 

a 

P,-2 

p  = 

P2\ 

P22 

P23 

-  Pin 

o= 

o3 

p,.= 

Prl 

Pll 

P32 

P33 

-  Pin 

Pm\ 

Pm2 

Pml 

•••  Pmn 

a, 

Prm 

Fig.  3.  Model  of  serial  projecting  processes  accompanied  by  feedback  generated  in  the  parallel  projecting 

areas  [own  elaboration]. 

The  production  of  the  final  product  is  based  on  producing  successively  the  following  groups  of  products  in 
compliance  with  Fig.  4. 


r; 

n 

n 

^prl 

Upr2 

w 

<^pr3 

► 

AJpm 

- ► 

VJprm 

Fig.  4.  Model  of  groups  of  products  [own  elaboration] 


where  Gprl=  prl  =  pr2  + . + 

Gpr2  =  Pr2  +  Pr3  + . +  Prm  7. 

Gprm  Gp 

Gp  is  the  final  product 

EXAMPLE  OF  APPLICATION 

The  way  of  rule  implementation  making  use  of  the  fuzzy  logic  technique  for  the  method  proposed  is 
presented  in  Fig.  5.  The  method  makes  it  possible  to  define  the  membership  function  at  the  stage  of  setting 
up  the  formal  requirements  for  the  system  designed.  According  to  the  criteria  set  out  in  the  description  of 
the  method,  which  are  related  to  the  accomplishment  of  the  project,  advantage  has  been  taken  of  the  fuzzy 
logic  technique.  The  values  let  into  the  system  are  identified  with  a  group  of  fuzzy  magnitudes  and 
assigned  to  a  given  set  by  membership  function,  by  assignment  of  values  determined  by  experts.  In  order  to 
cany  out  the  right  “fuzzification”  process,  assignment  of  fuzzy  values  to  input  magnitudes,  use  is  made  of 


199 


various  membership  functions,  most  frequently  defined  by  the  designer  of  the  system.  An  example  of  the 
membership  functions  applied  to  the  project  is  given  by  fig.  6. 

In  the  case  of  constructing  a  system  with  knowledge  bases  using  object  techniques,  conditions  and 
operation  of  rules  have  been  verified  by  unclear  (fuzzy)  values.  On  completing  the  session  and  obtaining 
some  specified  fuzzy  magnitudes,  the  problem  of  their  change  into  output  values  was  carried  out  in  the 
defuzzification  process. 


Decision  solutions 

Defuzzification  of  data. 

at  natural  language  level, 

/  transformation  \ 

decision  solutions  at 

formalisation  of  solutions, 

l  of  rules  / 

formal  language  level. 

construction  of 

decision  solutions  at 

membership  function, 
fuzzification  of  data 

natural  language  level 

Fig.  5.  Rule  implementation  by  use  of  the  fuzzy  logic  techniques  [own  elaboration]. 


MEMBERSHIP 

FACTOR 


TRAFFIC 
INTENSITY 
NUMBER  OF 
VEHICULES  PER 
HOUR 


Fig.  6.  Example  of  rule  implementation  using  fuzzy  logic  techniques  [own  elaboration]. 


200 


Advantage  has  been  made  of  the  centre  of  mass  technique  consisting  of  the  following  steps  [4]: 

•  for  output,  the  membership  functions  are  defined  by  calculating  the  centroids; 

•  for  the  output  function,  a  reducing  procedure  is  used  to  decrease  the  value  to  a  level  at  which  it  was 

produced; 

•  calculation  is  made  of  the  surface  area  of  the  reduced  membership  function. 

The  output  values  have  been  calculated  in  terms  of  the  weighted  mean  of  coordinate  x  of  the  centroid  and 

the  newly  calculated  field  with  the  fields  of  the  weights. 

The  application  of  the  method  has  made  it  possible  to  use  fuzzy  values  (conditions  or  actions)  in  the 
creation  of  rules  ,  which  in  the  circumstances  of  expert-knowledge  engineer  cooperation  allows  for  a  more 
effective  acquisition  of  knowledge. 

REFERENCES 

1.  Ceri,  S.,Fratemali,  P.,  1997.  Database  Applications  with  Objects  and  Rules,  Addison-Wesley,  Harlow. 

2.  Coad,  P.,  Yourdon,  E.,  1991.  Object  Oriented  Design,  Prentice-Hall,  Engl.  Cliffs. 

3.  Gupta,  M.M.,  Kiszka,  J.B.,  Trojan  G.J.,  1986.  Multivariable  structure  of  fuzzy  systems,  IEEE 
Transactions  and  Systems,  Man  and  Cybernetics,  SMC-16,  7,  638-656. 

4.  Mesarovic,  M.,  Takahara,  Y.,  1989.  Abstract  Systems  Theory,  Lecture  Notes  in  Control  and 
Information  Science,  Springer. 

5.  Mesarovic,  M.,  Takahara,  Y.,  1970.  General  Systems  Theory:  Mathematical  Foundation  ,  Acad.  Press. 

6.  Rattray,  C.,  1996.  “Identification  and  Recognition  through  Shape  in  Complex  Systems”,  Computer 
Aided  Systems  Theory  -  EUROCAST95  (eds.  F.  Fichler,  R.  Moreno  Dfaz,  R.  Albrecht),  LNCS  1030, 
Springer- Verlag. 

7.  Weinberg,  G.,  1979.  Myslenie  systemowe.,  Warszawa. 

8.  Zadeh,  L.A.,  1978.  Fuzzy  sets  as  a  basis  for  theory  of  possibility.  Fuzzy  Sets  and  Systems  1,  3-28. 


201 


An  Efficient  Method  for  Constructing  Fuzzy  Rules 


Bojan  Novak,  Ivan  Rozman 

University  of  Maribor,  Faculty  of  Electrical  Engineering  and  Computer  Science, 

Smetanova  17,  2000  Maribor, 

Email:  novakb@,uni-mb.si 


ABSTRACT 

Recent  advances  have  merged  artificial  neural  networks  (ANNs)  with  fuzzy  logic  to  generate  automatically 
and  to  tune  membership  functions,  rules  and  inference  systems.  Unfortunately  these  tools  are  not  simple 
and  can  generate  very  complicated  error  surfaces  with  multiple  local  optimums  that  are  traps  for  the 
learning  algorithm.  If  the  structure  of  the  ANN  (the  number  of  neurons  in  the  hidden  layer)  is  not  properly 
defined,  the  actual  error  will  remain  high  despite  a  low  training  error,  i.e.,  over-fitting.  These  troubles  can 
be  avoided  through  the  selection  of  adequate  input  variables  and  proper  ANN  structure.  With  the  clustering 
methods  automatic  rule  generation  and  optimal  shape  of  membership  functions  can  be  generated.  In  this 
paper  a  different  approach  is  considered.  Instead  of  generating  cluster  centers,  some  vectors  are  chosen  by 
using  certain  described  criteria.  The  structure  of  the  learning  machine  is  defined  during  training.  The 
Vapnik  Chervonenkis  (VC)  dimension  is  introduced  as  a  measure  of  the  capacity  of  the  learning  machine. 
A  prediction  of  the  expected  error  on  the  yet  unseen  examples  can  be  estimated  with  the  help  of  the  VC 
dimension.  The  structural  risk  minimization  principle  is  introduced  to  construct  a  machine  with  the  lowest 
expected  error. 

INTRODUCTION 

Recent  advances  have  merged  artificial  neural  networks  (ANNs)  with  fuzzy  logic  to  generate  automatically 
and  to  tune  membership  functions,  rules  and  inference  systems.  Three  basic  combinations  exist: 

-  neural-based  fuzzy  systems 

-  fuzzy-based  neural  networks 

-  fuzzy-neural  hybrid  systems. 

In  the  third  approach,  the  best  properties  from  both  techniques  are  used.  To  this  category  belongs  adaptive 
controllers  such  as  the  FALCON  (Fuzzy  Adaptive  Learning  Control  Network)  and  the  ANFIS  (Adaptive 
Neural  Fuzzy  Inference  System).  They  are  implemented  in  the  form  of  the  radial  basis  function  network 
(RBFN).  The  idea  results  from  the  theory  of  function  approximations  and  as  well,  from  the  biologically 
inspired  theory  of  locally  tuned  and  overlapping  receptive  fields  which  are  a  well-known  structure  in 
regions  of  the  cerebral  cortex  and  the  visual  cortex.  Between  RBFN  and  fuzzy  inference  systems  there 
exists  functional  equivalence.  This  principle  is  applied  in  fuzzy-neural  hybrid  systems.  Backpropagation 
and  recurrent  ANN  are  the  most  often  used  ANN  for  the  task  of  non-linear  modeling.  Unfortunately  these 
tools  are  not  simple  and  can  generate  very  complicated  error  surfaces  with  multiple  local  optimums  that  are 
traps  for  the  learning  algorithm.  If  the  structure  of  the  ANN  (the  number  of  neurons  in  the  hidden  layer)  is 
not  properly  defined,  the  actual  error  will  remain  high  despite  a  low  training  error,  i.e.,  over-fitting.  These 
troubles  can  be  avoided  through  the  selection  of  adequate  input  variables  and  proper  ANN  structure.  In 
defining  an  optimal  structure,  different  time-consuming  methods  exist  such  as  pruning.  But  the  problem  of 
complicated  error  surfaces  with  multiple  error  surfaces  still  remains.  With  the  addition  of  clustering 
methods,  automatic  rule  generation  and  optimal  shape  of  membership  functions  can  be  generated.  Basically 
the  idea  of  clustering  is  to  generate  new  vectors  -  cluster  centers,  in  the  center  of  the  areas  where  a  cluster 
of  data  exists.  The  cluster  centers  are  the  basis  for  a  decision  about  rule  generation.  Another  important  fact 
is  that  in  practical  applications,  very  limited  sources  of  the  data  are  available  for  learning.  An  ANN  is 
capable  of  learning  on  a  data  set  that  is  large  enough.  Different  validation  techniques  exist  to  construct  the 
ANN  with  a  minimal  actual  error.  These  methods  must  extract  a  significant  amount  of  data  into  the 
validation  set,  so  the  training  set  becomes  even  smaller  which  significantly  affects  the  quality  of  the  ANN 
performance. 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


202 


In  this  paper  a  different  approach  is  considered.  Instead  of  generating  cluster  centers,  some  vectors  are 
chosen  by  using  certain  described  criteria.  The  structure  of  the  learning  machine  is  defined  during  training. 
The  Vapnik  Chervonenkis  (VC)  dimension  is  introduced  as  a  measure  of  the  capacity  of  the  learning 
machine.  A  prediction  of  the  expected  error  on  the  yet  unseen  examples  can  be  estimated  with  the  help  of 
the  VC  dimension.  The  structural  risk  minimization  principle  is  introduced  to  construct  a  machine  with  the 
lowest  expected  error.  The  result  is  that  validation  is  unnecessary  since  a  reliable  formulation  of  the  upper 
bound  on  the  actual  error  is  formulated  based  on  the  VC  dimension.  This  is  particularly  important  when 
the  data  set  is  small. 

The  problem  is  transformed  in  the  reproducing  kernel  Hilbert  space  which  is  a  very  efficient  method  to 
transform  a  non-linear  problem  to  the  linear  form. 

MATHEMATICAL  MODEL 

For  a  given  k  observations  each  consists  of  a  pair:  xs ,  yi ,  where  Xj  ?  Rn,  i=l ,. . .  k  is  input  vector  and  y, 
is  associated  output.  Learning  a  machine  is  actually  building  up  a  mapping  ability  x  ?  f(x,a)  where  the 
functions  f(x,a)  themselves  are  labeled  by  adjustable  parameters  a.  For  the  artificial  neural  networks 
(ANN)  a  represents  weights  and  biases.  The  expectation  of  test  error  for  the  trained  machine  is: 

^(a)  =  J^|^-/(x,a)JP(x,a)|  1. 

R(a)  is  the  expected  risk.  P  is  a  probability.  The  measured  mean  error  rate  on  the  finite  number  of 
observations  is  “empirical  risk”: 

^«p(a)  =  ^X|.F-/(x,,a)|  2. 

2k  ,=1 

Remp(a)  is  fixed  for  a  particular  choice  of  a  and  for  a  particular  training  set  {Xj,  a}  and  the  probability  is  not 
included  in  the  equation.  The  quantity  'A  |  yj  -  f(xj,  a)|  is  loss  function.  But  empirical  risk  minimization  does 
not  imply  small  error  on  the  test  set  if  the  number  of  examples  in  the  training  data  set  is  limited.  The 
structural  risk  minimization  is  one  of  new  techniques  for  handling  efficiently  limited  amount  of  data.  For  a 
chosen  ?:  0  =  ?  =  1  the  bound  holds: 


where  F  is  defined  as  : 


l£S(n) 

k  ’  k 


)  = 


/2(l0g^  +  l)~l0g(~) 
h  4 


4, 


The  parameter  h  is  Vapnik  Chervonenkis  (VC)  dimension  [2,3,4],  It  describes  the  capacity  of  a  set  of 
functions  implemented  on  the  learning  machine. 

According  to  Equation  3.,  risk  can  be  controlled  by  two  quantities:  R<,mp  (a)  and  h({f(x,a)  :  a  ?  ksUb}), 
where  ksUb  is  some  subset  of  index  set  k.  The  empirical  risk  Rerap  depends  on  the  choice  of  optimal  function 
(a)  applied  in  the  learning  machine.  The  VC  dimension  h  depends  on  the  set  of  functions  |f(x,a)  :  a  ?  ksub 
}  .  The  parameter  h  is  controlled  by  introducing  structure  of  nested  subsets  Sj,  :={f(x,a)  :  a  ?  kn  }  of 
(f(x,a)  :  a?  k }, 


with  the  adequate  VC  dimensions  satisfying: 


h,=h2= . h„=  (6) 

The  structural  minimization  principle  chooses  the  function  f(x,a  )  in  the  subset  {f(x,a):  a?  kn}  with  the 
minimal  right  hand  side  of  the  eq.  3.  The  guaranteed  risk  bound  is  minimal. 

For  the  nonlinear  tasks  such  as  regression,  identification,  control,  fuzzy  system  modeling  a  non-linear 
support  vector  approach  is  applied.  A  non-linear  mapping  is  applied  to  map  data  in  higher  dimension 
feature  space  where  a  linear  regression  is  applied.  This  is  possible  with  the  kernel  functions.  This  functions 
have  origin  in  theory  of  Reproducing  Kernel  Hilbert  Spaces  [  1 ,2,3 ,4]  .An  inner  product  in  feature  space  has 
an  equivalent  kernel  input  space: 

K(x,y)  =  k(x)-k(y)  7. 

IfK  is  positive  definite  function,  satisfying  Mercer's  conditions: 


*(*,y)  =  Xa»v(*)v(y),  a»<-°  8- 

m= 1 

J  K(x,y)g(x)g(y)dxdy  >  0,  jV(x)Jx<°°  9. 

Then  kernel  is  legitimate  product  in  feature  space. 

There  are  different  functions  satisfying  Mercer's  condition:  polynomial,  splines,  B-splines,  radial  basis 
functions,  etc.  In  present  work  the  Gaussian  radial  basis  function  will  be  used: 

m  y)=exp[-f^£]  io. 

2(5 

The  support  vector  technique  places  in  each  support  vector  one  local  Gaussian  function.  This  means  that  no 
clustering  method  is  needed.  The  basis  width  s  can  be  selected  using  the  structural  minimization  principle 
defined  in  Equation  5. 

A  special  from  of  the  loss  function  described  in  Equation  2  is  the  e-insensitive  loss  function  and  does  not 
penalize  the  errors  below  some  small  positive  value  for  the  e: 

|y-/(x,.,a)|E  =  max{0,| y-/(x)-e  |}  U. 

The  estimation  of  a  function  is: 

f(x)  =  wTx  +  b,  w,xeR",6eR  12. 

With  the  application  of  the  e-insensitive  loss  function  following  quadratic  optimization  problem  can  be 
defined: 


204 


\ 


min<D(x,4  \£)  =  ]-  (wrw)+  C(^,‘  +  13- 

^  1=1  /=! 

with  the  respect  to  the  following  constraints: 

WTX(.  +b-y,  <8  +£,. 

yi-wrxi-b<E+^ 

W*  >  0 

Where  ?,  ?*  are  slack  variables.  Some  small  subset  of  the  training  data  called  support  vectors  (SV)  is 
extracted  from  the  data  and  the  optimal  separation  is  found  that  is  equivalent  to  the  optimal  separation  for 
the  entire  data  set.  Optimal  separation  means  that  the  minimal  distance  for  the  closes  point  to  the  separating 
hyperplane  between  the  two  different  classes  is  maximal.  If  hyperplane  is  described  by  the  equation  y  =wxj 
+  b  then  finding  optimal  hyperplane  means  minimizing  ||w||2  subject  to  the  constraints  defined  in  13  [2,3,4]. 
In  general  it  can  be  solved  as  dual  quadratic  problem  and  this  form  is  referred  to  as  support  vector  machine 
SVM  technique.  The  scalar  C  controls  the  trade-off  between  complexity  and  proportion  of  non-separable 
sample  points  [3].  In  Equation  13  the  first  term  |  w|  2  expresses  the  model  complexity  of  the  e-insensitive 
loss  function  described  in  Equation  1 1. 

The  following  dual  problem  can  be  developed  from  Equation  13  including  the  property  from  Equation  7: 

k  k  j  i 

max ®(,a\a)  =  -£(£(*;  +  «,.)  + -a,)y,  --£(<*,*  -«,)(«*  -a y)£(x,x.)  14. 

(=1  1=1  ^  lj=l 

with  the  respect  to  the  following  constraints: 

i=i 

a], a,  g 

and  w  is  defined  as: 
k 

W  =  ]£(<*; -cc,.)x,.  =0  15. 

i=i 


APPLICATION 

For  illustration,  the  model  described  in  Equation  14  was  applied  to  the  simple  example  of  identifying  a 
noisy  sine  curve  —  Figure  1 . 


205 


Fig.  1.  Noisy  data  identification,  +  prediction,  -  actual  data,  -  sine  curve 


Fig.  2.  Support  vectors  (circles)  for  the  noisy  data  problem  from  Fig.  1 .  Dashed  lines  define  2?  area. 

The  parameters  were  the  following:  C  =  100,  s  =  1,  ?  =  0.2.  The  Support  vectors  found  by  the  SVM  are 
presented  in  Figure  2.  In  the  proposed  model,  the  Sugeno-type  fuzzy  inference  system  was  applied.  The 
membership  functions  were  Gaussian  radial  basis  functions  (Equation  10).  From  Figure  1,  it  is  evident  that 
the  prediction  is  close  to  the  sine  curve.  The  even  data  were  the  training  set  and  the  odd  data  were  the 
testing  set.  Some  practical  applications  of  the  SVM  can  be  found  in  [4,5,6], 


206 


CONCLUSION 

An  alternative  approach  for  fuzzy  rules  generation  is  described.  Instead  of  generating  the  cluster  centers, 
some  vectors  are  chosen  by  using  certain  described  criteria.  The  structure  of  the  learning  machine  is 
defined  during  training.  The  Vapnik  Chervonenkis  (VC)  dimension  is  introduced  as  a  measure  of  the 
capacity  of  the  learning  machine.  A  prediction  of  the  expected  error  on  the  yet  unseen  examples  can  be 
estimated  with  the  help  of  the  VC  dimension. 

The  structural  risk  minimization  principle  is  introduced  to  construct  a  machine  with  the  lowest  expected 
error.  The  result  is  that  validation  is  unnecessary  since  a  reliable  formulation  of  the  upper  bound  on  the 
actual  error  is  formulated  based  on  the  VC  dimension.  This  is  particularly  important  when  the  data  set  is 
small. 

The  problem  is  transformed  in  the  reproducing  kernel  Hilbert  space  which  is  a  very  efficient  method  to 
transform  a  non-linear  problem  to  the  linear  form.  Advantages  over  other  fuzzy-neural  hybrid  systems  are: 
the  architecture  of  the  system  doesn't  have  to  be  determined  before  training.,  solution  of  the  optimization 
problem  is  unique,  whereas  conventional  ANNs  have  multiple  local  minimum  error  surface. 


REFERENCES 

1.  A.  Aizerman,  E.  M.  Braverman,  L.  I.  Rozoner,  1964.  Theoretical  foundations  of  the  potential  method  in 

pattern  recognition  learning,  Automation  and  Remote  Control,  25,  821-837. 

2.  B.  Scholkopf,  C.  Burges,  V.  Vapnik,  1995.  Extracting  Support  Data  for  a  Given  Task,  in  U.  M.  Fayyad 

and  R.  Uthurusamy,  eds.,  First  International  Conference  on  Knowledge  Discovery  and  Data  Mining, 
Proceedings,  AAAI  Press,  Menlo  Park,  CA. 

3.  V.  N.  Vapnik,  S.  E.  Golowich  and  A.  Smola,  Support  vector  method  for  function  approximation, 
regression  and  signal  processing  in  Advances  in  Neural  Information  Processing  Systems.  Vol.  9  MIT 
Press,  Cambridge  MA.,  USA. 

4.  V.  N.  Vapnik,  1998.  Statistical  Learning  Theory,  John  Wiley  and  Sons. 

5.  B.  Novak,  1999.  Application  of  support  vectors  machines  for  the  non-linear  identification  in  the 
electrical  power  systems,  submitted  to  the  Trans,  on  Neural  Networks,  IEEE. 

6.  B.  Novak,  1999.  Computer  Supported  Medical  Diagnosis,  MIE  99,  in  print. 


207 


Fuzzy  Clustering  Model  based  on  Changes  in  Vagueness 

Mika  Sato-liic 

University  of  Tsukuba 
Inst,  of  Policy  and  Planning  Sciences 
Tenodai  1-1-1,  Tsukuba,  Ibaraki  305-8573,  Japan 
Email:  mika@sk.tsukuba.ac.jp 


ABSTRACT 

This  paper  proposes  a  fuzzy  clustering  model  to  extract  the  exact  changes  of  vagueness  in  data,  which  are 
observed  as  similarities  of  objects  over  time.  That  is,  the  objective  data  is  assumed  to  have  vagueness, 
which  changes  over  time.  1  regard  this  data  as  3-way  data.  For  such  3-way  data,  the  most  difficult  problem 
has  been  that  the  optimal  solutions  at  different  times  are  in  conflict  with  one  another.  In  order  to  solve  this 
problem,  conventional  methods  have  used  parameters  to  represent  the  weights  of  clusters  at  different  times. 

However,  in  such  a  case,  we  cannot  see  the  exact  change  in  vagueness.  So,  in  this  paper,  I  propose  a 
clustering  model  for  defining  situations  of  dynamic  change.  The  vagueness  of  an  observation  is  defined  by 
convex  and  normal  fuzzy  sets  (CNF  sets).  I  define  a  conical  membership  function  to  represent  the  CNF 
sets.  The  dissimilarity  between  two  observations  is  defined  as  a  fuzzy  asymmetric  dissimilarity.  In  order  to 
deal  with  the  asymmetry  of  this  fuzzy  dissimilarity,  I  use  an  asymmetric  aggregation  operator  which  is 
similar  to  the  asymmetric  metric.  Finally,  I  present  numerical  results  from  an  application  of  the  proposed 
model,  which  will  show  the  validity  of  the  model. 


INTRODUCTION 

In  conventional  cluster  analysis  vagueness  has  not  been  considered,  even  with  observational  error,  when 
defining  a  similarity  (or  dissimilarity)  between  objects.  However,  in  a  real  situation,  there  are  many  cases 
where  the  data  involves  subject  or  linguistic  vagueness.  For  instance,  the  dissimilarity  based  on  human 
relationships  or  perceptual  confusion,  is  considered  to  have  some  fuzziness.  To  represent  such  a 
dissimilarity,  I  propose  a  fuzzy  dissimilarity  between  a  pair  of  fuzzy  observations  which  is  an  extension  of 
the  fuzzy  distance.  A  fuzzy  distance  between  two  fuzzy  data  O,  and  Oj  is  defined  by  the  use  of  two  a-level 
sets  0,( a),  Oj( a).  In  usual  set  theory,  there  are  many  definitions  of  the  distance  between  two  sets.  Here  I 
employ  a  kind  of  asymmetric  dissimilarity  between  <7, (a)  and  O/a).  The  dissimilarity  from  0,{ a)  to 
Oj( a),  is  defined  as  the  maximum  distance  from  the  point  of  Oja)  to  the  set  O/a),  and  is  defined  from 
the  point  of  O/a)  to  the  set  Oja),  (defined  in  the  subsequent  section  in  detail). 

In  order  to  get  the  fuzzy  distance  or  fuzzy  dissimilarity,  it  is  natural  to  assume  that  the  classified  object  is 
observed  as  fuzzy  data  which  has  been  defined  by  several  researchers  ([1,4,7]).  The  fuzzy  data  in  this  paper 
is  represented  by  a  fuzzy  set  with  a  conical  membership  function  in  which  the  boundaries  of  a-level 
surfaces  are  hyperellipsoids  in  R”.  This  means  that  the  direction  of  the  hyperellipsoid  shows  a  correlation 
with  the  vagueness  of  the  observed  attributes  and  the  size  of  the  hyperellipsoid  is  considered  the  degree  of 
vagueness.  From  the  definition  of  fuzzy  dissimilarity,  I  can  find  that  this  dissimilarity  varies  with  the  shape 
of  the  hyperellipsoid  which  is  determined  by  the  degree  of  the  fuzziness  and  its  correlation.  Then  I  can 
employ  such  a  dissimilarity.  To  deal  with  the  asymmetry  of  this  fuzzy  dissimilarity,  I  use  an  asymmetric 
aggregation  operator  [8]  which  is  similar  to  the  asymmetric  metric. 

This  paper  focuses  on  data  which  is  observed  by  three  ways,  i.e.,  the  dissimilarities  of  data  over  time. 
According  to  the  times,  the  vagueness  of  a  relationship  among  observations  changes.  Real  situations  may 
have  such  data,  e.g.,  diagnostic  data  on  the  system  or  illness  in  medical  care  or  biological  sciences  etc.  To 
treat  such  data,  I  have  proposed  a  dynamic  clustering  model  which  can  extract  changes  in  the  vagueness 
over  time  [8].  In  conventional  methods  for  such  3-way  data,  INDSCAL  [2]  and  INDCLUS  [3]  are  well- 
known  for  dealing  with  data  types  which  consist  of  similarities  in  objects  over  time  (or  context).  In  the 
above  conventional  methods,  the  differences  at  different  times  are  represented  by  parameters  which  show 
the  weights  of  clusters  for  each  time.  However,  in  such  cases,  we  can  not  obtain  a  result  which  shows  the 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


208 


exact  changes  of  the  clustering  situations.  So,  I  propose  a  model  that  can  show  the  exact  changes  over  time 
using  the  idea  to  fix  the  clusters  at  different  times,  i.e.,  the  model  gets  solutions  about  clusters  based  on  a 
fixed  space.  Then  it  is  possible  to  find  solutions  which  can  be  compared  to  each  other  over  time.  In  order  to 
construct  the  model,  I  use  the  idea  of  a  fuzzy  additive  clustering  model  [8]. 


FUZZY  DATA 

Fuzzy  data  in  R"  are  defined  by  membership  functions  p(jt),  xsR".  I  assume  that  membership  functions  are 
conical,  and  defined  as  follows  using  a  vector  and  a  matrix.  A  position  vector  corresponding  to  fuzzy  object 
Oj,  i'=l,..., N  is  denoted  by  o,  e  Rn,  i=l,...,N  and  P0i  is  a  positive  semidefinite  n  x  n  matrix.  1  denote 
by?  •?  a  the  following  elliptical  norm  of  the  distance  between  an  arbitrary  ate  R"  and  o,  : 

x-o^  x0t 

[  +  “  otherwise  , 

where  Pq'1  is  the  Moore-Penrose  generalized  inverse  of  P0„  and  X0,  c  Rn  is  the  linear  space  spanned  by  the 
columns  of  P0i,  The  conical  membership  function  of  O,  is 

Po,(*)=1-mini14*-°;lo(]-  !• 

The  property  is  such  that  \i.0,(x)  >  0  is  satisfied.  If  the  matrix  P0i,  is  positive  definite,  then  Equation  1 
defines  a  cone  in  Rn,  and  the  boundaries  of  the  support  of  (i0,  and  of  other  level  surfaces  p0,  =  constant  are 
hyperellipsoids  in  R".  The  conical  membership  function  |t0,  is  normalized  to  0  <  pn,  <  1,  and  the  fuzzy  set 
with  a  conical  membership  function  is  a  convex,  normal  fuzzy  set  [9].  In  R1,  the  conical  membership 
function  is  the  triangular  function 

Fo,  (*)=!-  min  { h  \x  -  O,  |  /  sQ  }, 

where  s0i-  P0m  is  the  spread  of  the  function.  Therefore,  the  diagonal  elements  of  Pot  denote  the  fuzziness 
corresponding  to  each  variable x„  t=l,...,p,  and  off-diagonal  elements  of  P0,  indicate  possible  interactions 
between  the  components  of  fuzzy  membership  functions,  namely  they  describe  the  tendency  of  a 
correlation  of  vagueness  among  the  variables.  Figure  1  shows  the  conical  membership  function  of  the  fuzzy 
objects  Oj,  i'=l,2  in  R2,  where  0,(a)  are  a-level  surfaces  of  these  conical  membership  functions. 


Fig.  1.  Conical  membership  functions. 


FUZZY  DISSIMILARITY 

In  order  to  measure  the  dissimilarity  among  the  above  fuzzy  objects,  I  introduce  the  fuzzy  dissimilarity 
shown  in  figure  2.  This  figure  shows  the  simplification  of  this  fuzzy  dissimilarity  in/?2,  O/a),  i=  1,2  are  a- 
level  surfaces  of  the  conical  membership  functions,  and*,,  t=  1,2  are  variables.  The  fuzzy  dissimilarities  at 
a-level  between  i-th  and y'-th  fuzzy  objects  are  defined  as  follows: 

dtm  =sup{4*,0,.(a))|  Oj  (a  )},  d%(iJ)  =  sup  [d{0, (a  ),  x)|  x  e  07(a)}; 
d(x,0  j  (a  ))=  min  {d(x,y)\  ye  Oy(a)}. 

where  d(x,  0,{ a)),  /=!,...  ,jV  is  a  distance  between  a  point  x  and  a  set  0,{ a)  in  a  usual  sense,  namely 


209 


In  Figure  2,  dmin(ij,  (ij=  1,2,  ilj)  shows  an  ordinal  distance  between  the  two  sets,  if  I  assume  that  O, {a),  /=1,2 
are  conventional  sets.  dminfij!  is  invariant  for  the  shape  and  the  direction  of  the  sets,  and  it  holds  dmi„(jj  = 
dminoo-  On  the  other  hand,  the  fuzzy  dissimilarities,  dL(ij)  and  dR(jjh  vary  with  the  shape  and  direction  of  the 
sets,  and  clearly,  it  holds  that  dL(ij)l dR(ij) .  If  xl  R1,  then  dL(j)  and  dR(ij)  are  lower  and  upper  fuzzy  distances, 
respectively  [6],  so  if  I  define 

d(ij)  =  max  {di(ij) ,  dR(jj) ) , 

then  dfj)  satisfies  the  condition  of  a  Hausedorff  distance. 


ADDITIVE  FUZZY  CLUSTERING  MODEL 

The  main  purpose  of  unsupervised  clustering  of  a  set  of  objects  is  to  detect  natural  subgroups(clusters) 
based  on  the  similarity  between  the  pair  of  objects.  According  to  the  definition  of  a  fuzzy  set  or 
interpretation  of  natural  subgroups,  I  have  proposed  an  additive  fuzzy  clustering  model.  1  define  the 
additive  fuzzy  clustering  model  as  follows: 

K  . 

s,j  =  X  P  (“»  ’U  Jk  )+  e  j »  2- 

k  =  1 

where  the  similarity  stj  has  a  ratio  scale  and  0  <  Sj  <  1 .  uik  is  a  fuzzy  grade  which  represents  the  degree  of 
belongingness  of  object  i  to  cluster  k.  Generally,  uik  are  denoted  by  using  the  matrix  representation  U=(uik) 
called  a  partition  matrix,  which  satisfy  the  following  condition: 

M(i>0,  X  u'k  =  !-  3- 

*=] 

The  function  p  (uik,  ujk)  is  the  degree  of  simultaneous  belonging  of  objects  i  and  j  to  cluster  k.  That  is,  the 
function  denotes  the  degree  of  sharing  of  common  properties.  £,y  (=  £,,-)  is  an  error  term.  The  aggregation 
function  is  assumed  to  satisfy  the  following  conditions: 


1 . 

0  <  p  (««  ,My;  )<  1,  P  (w«  ,0)=  0, 

P  («  a  d  )  =  «,*  , 

2. 

P  (««.«*)  5  P  )  whenever 

«  «  ^  .  »  //  s  "  ,i 

3  . 

P  ("  Uc  •  «  jl  )  =  P  «  ) 

where  i,j,s,t  are  suffixes  for  objects,  k,l  are  suffixes  for  clusters,  and  they  satisfy  1  <  i,j,s,t  <  n,  1  <k,l<  K. 
The  method  of  fuzzy  clustering  based  on  this  model  is  to  find  the  partition  matrix  U=(uik)  which  satisfies 
condition  3  and  has  the  best  fitness  for  model  2.  Then,  I  find  U  which  minimize  the  following  sum  of 
square  error  p2  under  condition  3, 


i*  j=  i 


ASYMMETRIC  ADDITIVE  FUZZY  CLUSTERING  MODEL 

In  practical  applications,  similar  data  is  not  always  symmetric.  For  such  asymmetric  similar  data,  the 
additive  fuzzy  clustering  model  2  is  extended  using  an  asymmetric  aggregation  function  which  is  defined  as 
a  function  satisfying  the  above  conditions  (1  and  2  in  the  above  section.)  Denoting  the  asymmetric 
aggregation  function  by  y  (x,  y),  this  is  defined  as  follows: 


210 


Suppose  that  /  (x)  is  a  generating  function  of  /-norm  and  <J)(x)  is  a  continuous  decreasing  monotone 
satisfying  <J>:  [0,1]  — >  [1,  °°],  <()(1)  =  1.  Then  we  define  the  asymmetric  aggregation  operator  y(x,  y)  as: 

Using  the  asymmetric  aggregation  functions,  we  define  the  additive  fuzzy  clustering  model  for  asymmetric 
similarity  data  as  follows: 

K 

sv  =  2  y(uik  >ujk  )+£,>-.  5. 

k=  1 

where  y  {uik,  uJk)  *  y  (ujk,  uik).  This  model  expects  to  find  clusters  in  which  the  objects  are  not  only  similar  to 
each  other  but  also  asymmetrically  related. 


DYNAMIC  ADDITIVE  FUZZY  CLUSTERING  MODEL 


The  data  are  observed  by  the  values  of  similarity  with  respect  to  n  objects  for  T  times,  and  the  similarity 
matrix  of  /-th  time  is  shown  by  S(,)  =  {  s/1 } .  Then  a  super-matrix  is  defined  as  follows: 


~s(1) 

5(12) 

s(n)  .. 

..  s(U) 

s'  = 

s(21) 

S(2) 

S(23)  .. 

..  s(2T) 

5(H) 

Sd2) 

s(r3)  .. 

..  S(T) 

where,  Sfr)  is  an  asymmetric  similarity  matrix  for  /-th  time  and  S(t>  =  {  s/* },  s/'*  *  sJ'K  tfr,)  is  defined  by  S?r> 
and  S^;as  follows: 


■r  ■r(4'>.4,>). 


where  y  (•,•)  is  the  asymmetric  aggregation  operator  shown  above.  Then  the  model  is  as  follows: 

K 

sy  =  (“''(')*  ’“/(')*)’  1  Zi,j<Tn,  i={n-\)t  +  i(,\  7. 

k  =  I 

where,  1  <  /l><  n,  \  <t<T.  uik(,)  is  a  degree  of  belongingness  in  /-th  time  of  an  object  i  to  a  cluster  k  and  sj 
is  (/y)-th  element  of  the  matrix  S'  in  6. 


NUMERICAL  EXAMPLE 

I  shall  show  a  numerical  experiment  using  artificial  data  shown  in  Figures  3  and  4.  These  diagrams  denote 
the  a-level  surfaces  of  conical  fuzzy  membership  functions  of  8  fuzzy  objects  at  two  different  times.  From 
Figure  3  to  Figure  4,  the  4th  object  changes  in  vagueness,  i.e.,  the  object  changes  from  a  fuzzy  observation 
to  a  crisp  one.  In  this  case,  the  value  of  a  is  0.5.  The  proposed  model  uses  similarity,  so  I  transformed  from 
similarity  data  to  dissimilarity  data  using  the  following  linear  transformation  in  the  unit  interval: 

_  .  d L(jj)  d R(ij) 

SL{ij)  ~  J  >  “/.(max  )  =  S R(ij)  ~  ~  ’  ^/?(raax  )  =  max  ^ R(ij)  • 

a  L(  max)  ‘<J  “/((max)  '•/ 

Tables  1  and  2  show  fuzzy  dissimilarity  matrixes  for  these  fuzzy  objects  (Figs.  3  and  4),  in  particular,  the 
upper  triangular  matrix  denotes  dL(iJ),  while  the  lower  triangular  matrix  is  dR(j).  Table  3  shows  the  results  of 
model  2  using  the  fuzzy  dissimilarity  shown  in  Table  1,  in  which  the  similarities  are  sL(iJ),  sR(j),  -  the 
transformed  Hausedorff  distances,  respectively.  I  use  the  algebraic  product  as  p.  In  this  table,  C;,  C2 
represent  two  clusters,  and  each  value  denotes  the  fuzzy  grade  that  a  fuzzy  object  belongs  to  in  each  cluster. 
iy  shows  the  value  of  fitness  for  the  method.  From  this  table,  I  can  find  the  difference  of  the  4th  and  5th 
fuzzy  objects  for  belonging  to  the  clusters.  Namely,  if  the  classification  is  based  on  sR(iJ) ,  then  the  4th  and 
5th  fuzzy  objects  are  combined  with  the  1st  -  3rd  fuzzy  objects,  otherwise  they  belong  to  a  cluster  whose 
components  are  the  6th  -  8th  fuzzy  objects  by  .  This  difference  is  caused  by  a  feature  of  fuzzy 
dissimilarities,  that  is,  these  dissimilarities  depend  on  the  direction  and  the  size  of  the  hyperellipsoids.  The 
results  of  sL(y)  and  the  Hausedorff  distance  are  almost  same.  However,  in  the  case  of  the  Hausedorff 
distances,  the  value  of  fitness  is  worse  than  the  case  of  sL(iJ) . 


211 


Figure  5  shows  the  results  of  model  7  using  data  shown  in  Tables  1  and  2.  The  asymmetric  aggregation 
operator  in  model  7  is  y  (x,y)  =  (x2y)/(\-y+xy)  which  is  created  by  a  generating  function  of  the  Hamacher 
product  [5],/(x)  =  ( 1  -x)lx  and  <f)  (x)=\lx2.  In  this  figure,  the  abscissa  shows  the  numbers  of  objects  and  the 
ordinate  shows  the  degree  of  belonging  of  objects  to  clusters:  (a)  shows  the  degree  of  belonging  to  cluster  1 
while  (b)  shows  the  degree  of  belonging  to  cluster  2.  From  this  figure,  I  can  see  changes  in  the  situation  of 
object  4.  Moreover,  we  can  see  movement  in  object  7.  This  shows  the  large  change  in  dissimilarity  between 
objects  4  and  7  from  the  1st  time  to  the  2nd  time,  by  comparing  other  dissimilarities  between  the  objects. 


Table  1:  Fuzzy  distance  matrix  at  1st  time  (a  =  0.5). 


No 

1 

2 

3 

4 

5 

6 

7 

8 

1 

0 

8 

9 

21 

22 

63 

63 

71 

2 

9 

0 

9 

24 

25 

57 

73 

57 

3 

11 

8 

0 

14 

16 

56 

54 

63 

4 

50 

42 

39 

0 

9 

78 

86 

75 

5 

51 

53 

40 

9 

0 

75 

71 

83 

6 

62 

55 

51 

13 

13 

0 

8 

7 

7 

66 

65 

55 

22 

25 

8 

0 

10 

8 

72 

59 

62 

23 

25 

14 

12 

0 

Table  2:  Fuzzy  distance  matrix  at  2nd  time  (a  =  0.5). 


No 

1 

2 

3 

4 

5 

6 

7 

8 

1 

0 

8 

9 

26 

22 

63 

63 

71 

2 

9 

0 

9 

27 

25 

57 

73 

57 

3 

11 

8 

0 

19 

16 

56 

54 

63 

4 

18 

16 

10 

0 

5 

37 

43 

34 

5 

51 

53 

40 

5 

0 

75 

71 

83 

6 

62 

55 

51 

ESM 

13 

8 

7 

7 

JEM 

65 

MM 

wm 

25 

8 

0 

10 

8 

warn 

MM 

m k 

\  40 

25 

14 

12 

0 

5  8  5  8 


Fig.  3.  a-level  surfaces  of  conical  membership  Fig.  4.  a-level  surfaces  of  conical  membership 

functions  at  1  st  time  (a  =  0.5).  functions  at  2nd  time  (a  =  0.5). 


Table  3:  Fuzzy  Clustering  at  1  st  time  (a  =  0.5). 


Sum 

HD 

Sum 

No. 

c, 

C2 

wm 

Ci 

wm 

■a 

1 

.16 

.84 

.24 

.76 

.01 

.99 

2 

.20 

.80 

.26 

.74 

.09 

.91 

3 

.19 

.81 

.27 

.73 

.16 

.84 

4 

.06 

.94 

.10 

.90 

.72 

.28 

5 

.08 

.91 

.15 

.85 

.74 

.26 

6 

.90 

.10 

.94 

.06 

.88 

.12 

7 

.93 

.07 

1.0 

.00 

.91 

.09 

8 

.94 

.06 

.98 

.02 

.91 

.09 

Tl' 

.0343 

.1969 

.0326 

HD  =  Hausedorff  distance 


212 


(a)  Clustering  for  cluster  1 . 

Fig.  5.  Dynamic  fuzzy  clustering  (a  =  0.5  and  r|  =  0.001 5). 


CONCLUSION 

This  paper  proposed  a  clustering  model  which  can  extract  changes  in  vagueness  involved  in  observations. 
This  can  be  possible  by  obtaining  solutions  on  the  same  coordinate,  in  other  words,  this  model  can  obtain 
clustering  results  using  the  same  clusters  throughout  time.  Concerning  the  representation  of  vagueness  for 
real  data  this  is  now  under  consideration. 


REFERENCE 

1.  H.  Bandemer  and  W.  Nather,  1992.  Fuzzy  Data  Analysis.  Kluwer  Academic  Publishers. 

2.  J.D.  Carroll  and  J.J.  Chang,  1970.  Analysis  of  Individual  Differences  in  Multidimensional  Scaling  via  an 
N-way  Generalization  of  "Eckart- Young"  Decomposition.  Psychometrika,  35,  283-319. 

3.  J.D.  Carroll  and  P.  Arabie,  1983.  INDCLUS:  An  Individual  Differences  Generalization  of  the  ADCLUS 
Model  and  MAPCLUS  Algorithm.  Psychometrika,  48,  1 57-169. 

4.  A.  Celmins,  1992.  Nonlinear  Least-Squares  Regression  in  Fuzzy  Vector  Spaces.  Fuzzy  Regression 
Analysis,  (J.  Kacprzyk  et  al.  eds.),  153-168. 

5.  R.  Fuller,  1991.  On  Hamacher-sum  of  Triangular  Fuzzy  Numbers.  Fuzzy  Sets  and  Systems,  42, 205-212. 

6.  L.T.  Koczy  and  K.  Hirota,  1993.  Ordering,  distance  and  closeness  of  fuzzy  sets.  Fuzzy  Sets  and  Systems, 
59,281-293. 

7.  T.  Okuda,  Y.  Kodono  and  K.  Asai,  1992.  Approximate  Maximum  Likelihood  Estimates  in  Regression 
Models  for  Fuzzy  Observation  Data.  Fuzzy  Regression  Analysis,  (J.  Kacprzyk  et  al.  eds.),  169-180. 

8.  M.  Sato-Ilic,  1999.  On  Dynamic  Clustering  Models  for  3-way  Data.  J.  of  Advanced  Computational 
Intelligence,  3(1),  28-35. 

9.  L.A.  Zadeh,  1965.  Fuzzy  Sets.  Information  and  Control,  8,  338-353. 


213 


Thin  Films  and  Surface  Processing 


214 


215 


Modelling  and  Control  of  Optical  Interference  Filters  Using 
Plasma  Assisted  Chemical  Vapour  Deposition 


D.A.  Linkens  ,  M.F.  Abbod*,  J.G.  Metcalfe **  and  B.  Nichols** 

*  Department  of  Automatic  Control  and  Systems  Engineering, 

University  of  Sheffield,  Sheffield,  England,  U.K. 

**  GEC -Marconi  Materials  Technology  Limited,  Caswell,  Towcester,  UK. 

ABSTRACT 

Optical  interference  filters  with  continuously  modulated  refractive  indices  throughout  their  thickness  (rugates) 
are  designed  and  fabricated  using  microwave  plasma-assisted  chemical  vapour  deposition  techniques. 
Intelligent  modelling  and  control  techniques  are  used  to  model  the  process  and  improve  the  reproducibility  of 
reflection  filters. 


INTRODUCTION 

Dielectric  optical  interference  filters  are  made  by  depositing  a  sequence  of  discrete  layers  of  transparent 
materials  that  have  different  refractive  indices.  Different  filters  with  different  optical  performances  can  be 
made  by  depositing  high  and  low  refractive  indices  in  multiple-layers.  One  of  the  most  significant  recent 
advances  in  optical  filters  has  been  the  realisation  of  rugate  type  filter  designs.  A  rugate  filter  (Latin:  rugosus 
=  wrinkled)  is  an  interference  filter  in  which  the  refractive  index  varies  periodically  as  a  smooth  continuous 
function  of  optical  thickness,  the  simplest  example  being  that  the  variation  is  sinusoidal.  The  implementation 
of  the  most  advanced  rugate  type  designs  should  achieve  optical  performances  that  would  have  been 
impossible  using  conventional  design.  The  realisation  of  rugate  filters  requires  both  the  availability  of  an 
optical  material  whose  refractive  index  can  be  varied  significantly  by  vaiying  its  composition  and  a  means  of 
depositing  the  material  uniformly  over  a  surface  while  controlling  its  index  continuously  to  a  high  accuracy. 

A  model  was  developed  for  a  similar  process  (semiconductor  manufacturing)  to  predict  the  thickness  of  the 
deposited  polysilicon  films  using  a  fuzzy  inference  system  [1],  In  the  present  paper,  a  neuro-fuzzy  model  is 
presented  to  model  the  process  of  depositing  thin  films  on  a  transparent  substrate.  The  model  is  based  on 
experimental  data  obtained  from  a  microwave  plasma-assisted  chemical  vapour  deposition  (MPACVD) 
process,  utilising  different  inputs  such  as  gas  flow  rates,  pressure,  microwave  power  and  temperature.  The 
model  output  is  the  predicted  growth  rate  for  each  deposited  layer  and  its  refractive  index.  This  output  is  fed  to 
the  rugate  filter  design  to  obtain  the  final  characteristics  of  the  filter,  i.e.,  wavelength  and  stop  band. 

Based  on  the  model,  a  control  technique  is  under  development  to  set  up  the  inputs  and  calculate  the  number 
and  length  of  material-depositing  cycles.  Utilising  the  growth  characteristics  of  the  rugate  filter,  an  on-line 
correction  procedure  is  used  in  each  cycle  to  yield  consistent,  required  filter  characteristics.  In  this  paper  we 
discuss  issues  of  concern  in  designing  a  rugate  fabrication  process.  We  present  control  and  monitoring 
methods  that  have  been  investigated,  and  suggest  areas  where  improved  process  capability  in  controllers  based 
on  intelligent  principles  would  aid  development  of  this  emerging  technology. 


DEPOSITION  PROCESSES 

The  specific  rugate  coatings  are  optical  interference  films  where  the  refractive  index  of  the  film  continuously 
and  periodically  varies  as  a  function  of  the  filnA  optical  thickness.  The  index  variation  is  typically  achieved 
by  co-depositing  two  materials  of  different  index.  Co-deposition  complicates  process  monitoring  and  control. 
Bleed  gas  rates  for  reactive  deposition  and  microwave  beam  parameters  must  be  actively  controlled  as  the 
constituent  material  rates  are  varied  to  ensure  good  process  quality.  Blended  material  systems  are  more 
sensitive  to  temperature  and  source  plume  distributions  than  are  discrete  stack  filters.  Deviation  in  the 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


216 


controller  settings  for  source  flow  rates  and  in  temperature  can  cause  an  increase  in  out-of-band  transmission 
in  the  filters.  The  challenge  of  fabricating  rugate  filters  is  not  so  much  filter  design  as  process  control. 

Since  a  large  RI  variation  is  needed  in  the  filter  deposition,  silicon  oxynitride  was  used,  which  retains  a  glassy 
phase  throughout  its  composition  from  silica  (refractive  index  1.46)  to  silicon  nitride  (refractive  index  2.04), 
the  index  varying  smoothly  with  composition.  The  precursors  used  are  silicon  tetrachloride  (SiClt),  02  and  N2 
gases.  The  SiCfi  is  contained  in  a  constant  temperature  bubbler  and  transported  on  a  carrier  gas,  normally 
nitrogen.  The  argon  affects  the  power  density  of  the  microwave  plasma  but  does  not  take  part  in  deposition. 
Silicon  oxynitride  films  deposited  by  MPACVD  are  remarkably  durable  and  highly  resistant  to  scratching. 
They  resist  acid  attack  and  can  be  handled  and  cleaned  without  taking  special  precautions. 


DEPOSITION  EQUIPMENT 

Microwave  plasma-assisted  chemical  vapour  deposition  (MPACVD)  is  a  new  technique  being  applied  to 
fabricate  hard,  dielectric  optical  coatings.  The  method  is  currently  being  developed  within  GEC-Marconi  [2], 
The  deposition  equipment  is  shown  schematically  in  Figure  1 .  It  consists  of  a  plasma  chamber  made  of  fused 
silica  and  contained  in  a  furnace  designed  to  operate  up  to  1 000°C.  The  chamber  is  evacuated  by  a  pump  to 
maintain  a  working  pressure  in  the  chamber  of  about  1  mbar  for  the  required  gas  flow.  Microwave  power  is 
coupled  to  the  chamber  by  a  tuned  launcher  enclosing  the  gas  inlet  limb.  The  waveguide  feed  incorporates  a 
cross-coupler  with  attenuators  so  that  forward  and  reflected  power  is  continuously  monitored.  Typically,  a  net 
microwave  power  of  1  kW  is  supplied  to  the  plasma.  All  gases  supplied  to  the  chamber  are  controlled  using 
precision  mass-flow  controllers.  A  computer  program  calculates  the  flow  parameters  to  achieve  the  required 
RI  modulation  against  optical  thickness  during  each  rugate  cycle  based  on  known  calibration  data. 


Furnace 

Fig.  1.  Schematic  diagram  of  the  depositions  apparatus. 

PROCESS  MODELING 

Neurofuzzy  Modeling  (ANFIS) 

The  Adaptive  Network-based  Fuzzy  Inference  System  (ANFIS)  architecture  is  based  on  a  fuzzy  inference 
system  implemented  in  the  framework  of  an  adaptive  network  [3].  Using  a  hybrid  learning  procedure,  ANFIS 
can  learn  an  I/O  mapping  related  to  human  knowledge  (in  the  form  of  fuzzy  if-then  rules).  The  ANFIS 
architecture  has  been  employed  by  various  researchers  to  model  non-linear  functions,  identify  non-linear 
components  on-line  in  a  control  system,  and  predict  chaotic  time  series  etc. 

ANFIS  identifies  an  I/O  mapping,  available  in  the  form  of  a  set  of  N  input-output  examples,  with  a  fuzzy 
architecture,  inspired  by  the  Takagi-Sugeno  modeling  approach.  The  fuzzy  architecture  is  characterised  by  a 
set  of  rules,  which  are  properly  initialised  and  tuned  by  a  learning  algorithm.  The  rules  are  in  the  form: 

1.  if  input  1  is  An  and  input  2  is  A,2  then  output  =  f)(input  1,  input  2) 

2.  if  input  1  is  A2i  and  input  2  is  A22  then  output  =  f2(input  1 ,  input  2) 

where  Ay  are  parametric  membership  functions. 


217 


Process  Model 

The  experimental  results  obtained  from  the  process  consist  of  a  set  of  input  data  for  each  variable,  and  its 
corresponding  output  in  terms  of  the  layer  growth  rate  (GR)  and  the  refractive  index  (RI).  A  set  of  1 8  data 
points  was  used  to  train.  Seven  inputs  were  selected:  temperature,  microwave  power,  pressure  and  Q,  At, 
SiCl4  and  N2  flow  rates.  Neural  network  training  would  require  many  more  data  points  for  the  training  process 
to  generate  a  good  model,  unlike  ANFIS  which  can  produce  a  good  model  with  little  data.  Since  ANFIS  can 
generate  a  model  for  one  output  only,  two  models  were  generated  representing  the  GR  and  RI.  Figure  2  shows 
a  block  diagram  of  the  neurofuzzy  inference  system  for  the  MPACVD  model. 


Fig.  2.  Block  diagram  of  fuzzy  inference  system. 


The  two  outputs  from  the  process  model  are  fed  into  a  model  to  calculate  the  spectral  characteristics  of  the 
filter  [4].  A  typical  growth  record  is  shown  in  Figure  3.  The  filter  output  is  calculated  by  feeding  the  operating 
conditions  into  a  neuro-fuzzy  model  to  obtain  RI  and  GR.  Then  the  output  of  the  growing  process  is  calculated 
by  feeding  the  Optical  Thickness  (OT)  and  RI  into  the  optical  filter  equations  to  obtain  the  frequency  response 
during  each  depositing  cycle.  Optical  thickness  is  the  result  of  combining  RI  with  GR. 


Rugate  filter  output 


Fig.  3.  Rugate  filter  response  to  using  the  process  model. 


218 


Rugate  Filter 

Figure  4  shows  the  refractive  index  for  a  37  layer  filter  and  its  modelled  spectral  characteristics.  An  apodising 
envelope  modulated  sinusoidal  refractive  index  layer  can  produce  a  suppressed  side  band  rugate  filter  [5]. 


0.9 
0.8 
0.7 

0.6 

Q. 

I°s 

“  0.4 
0.3 
0.2 
0.1 
0 

1  1.2  1.4  1.6  1.8  2  2.2  2.4  2.6  2.8  3 

Wavelength 

(a)  Refractive  index  (b)  wavelength  response 

Fig.  4.  Refractive  index  profile  and  wavelength  response  for  a  37  cycle  filter. 

The  refractive  index  is  varied  smoothly  between  two  limits  in  a  sinusoidal  manner  by  varying  the  oxygen  gas 
flow  smoothly  in  the  same  manner.  The  application  of  MPACVD  to  the  deposition  of  a  variable  index  material 
such  as  silicon  oxynitride  allows  these  theoretical  models  to  be  realised  in  practice  without  the  complexity  of 
the  approximation  techniques  involving  constant  index  layers. 


CONTROLLER  DESIGN 

The  process  of  making  filters  starts  with  calibrating  the  apparatus  under  standard  operating  conditions  in  terms 
of  refractive  index  versus  oxygen  flow  rates.  This  also  allows  the  determination  of  deposition  rate  as  a  function 
of  flow  rates,  which  is  also  required. 

Initially  the  number  of  growing  cycles  and  refractive  index  (RI)  range  is  calculated  based  on  the  required 
reflectivity  and  bandwidth.  Then  the  cycle  length  is  calculated  based  on  the  growth  rate  obtained  from  the 
model.  Depending  to  the  deposited  materials,  operating  conditions  and  filter  design,  the  RI  will  vary  between 
two  limits.  In  our  case  the  limits  will  be  between  the  RIs  of  silica  (1.46)  and  silicon  nitride  (2.04).  Based  on 
the  cycle  number,  length  and  RI  variation,  sinusoidal  cycles  of  RI  are  generated.  The  RI  cycles  are  then  fed 
into  an  inverse  model  of  the  process  which  utilises  the  RI  and  the  other  input  variables  as  inputs,  and  predicts 
the  oxygen  level  required  for  achieving  the  input  RI.  This  neurofuzzy  model  was  developed  using  ANFIS 
which  was  trained  using  the  same  operating  data  but  in  a  different  format. 

In  order  to  obtain  a  uniform  sinusoid  in  optical  thickness  during  the  growing  process  and  modulation  of  the  RI, 
the  oxygen  cycle  has  to  be  modified  to  ensure  that  the  growth  rate  variation  with  oxygen  flow  rate  is  taken  into 
account.  The  modified  oxygen  cycles  are  then  fed  to  the  process  to  start  deposition  of  materials. 

During  the  growing  process,  the  filter  response  is  monitored  so  that  adjustments  may  be  made  to  the  growth 
conditions  to  counter  any  developing  defects.  Figure  5  shows  a  block  diagram  of  the  system  including  the 
model  and  controller. 


SIMULATION  RESULTS 

The  system  has  been  simulated  using  Matlab.  A  filter  was  simulated  to  achieve  a  reflectivity  of  0.996  at  a 
wavelength  of  2.0  microns  and  a  bandwidth  of  9%.  This  resulted  in  a  design  with  37  growth  cycles  and,  after 
adjusting  the  oxygen  cycle,  the  cycle  length  was  12  min.  This  means  the  growing  process  takes  444  min. 


219 


Fig.  5.  Block  diagram  of  the  PMACVD  control  system. 

During  the  simulation  of  the  growing  process,  a  defect  was  introduced,  at  cycle  1 1 .  To  counter  this  the 
software  reduced  the  growth  time  for  this  cycle  to  1 1,  then  the  cycle  length  was  set  back  to  normal  length  until 
the  end  of  the  deposition  process.  Without  correction,  the  filter  output  would  have  ended  up  with  an  erroneous 
bandwidth.  In  figure  6(a)  the  refractive  index  profile  of  an  apodised  37  sinusoidal  rugate  cycles  is  shown.  The 
final  filter  spectral  response  is  shown  in  figure  6(b).  The  solid  line  represents  the  filter  response  with  the 
correction  procedure,  while  the  dotted  line  represents  the  filter  response  without  correction  procedure. 


Refractive  Index 


(a)  Refractive  index  cycle 


Fig.  5.  37  cycle  rugate  filter. 


CONCLUSION 

MPACVD  has  been  successfully  applied  to  the  fabrication  control  of  continuously  modulated  refractive  index 
dielectric  interference  filters  of  rugate  design.  The  process  has  been  modelled  using  a  neurofuzzy  technique. 
The  model  interprets  the  input-output  relationships  captured  by  means  of  fuzzy  sets  and  inference  rules  trained 
using  the  neurofuzzy  technique.  The  model  can  be  used  to  simulate  the  behaviour  of  the  process  and  to  design 
the  control  algorithm  for  achieving  the  required  response. 


220 


ACKNOWLEDGEMENT 

Two  of  the  authors  (from  Sheffield  University)  gratefully  acknowledge  the  UK  EPSRC  (Engineering  and 
Physical  Sciences  Research  Council)  for  their  financial  support. 

Two  of  the  authors  (from  GEC-Marconi)  gratefully  acknowledge  the  support  of  the  European  Commission 
under  Brite-EuRam  project  BE-96-3059,  OPTICOM. 


REFERENCES 

1.  R  L.  Chen,  C.J.  Spanos,  1992.  Self-learning  fuzzy  modelling  of  semiconductor  processing  equipment. 
IEEE/SEMI  Advanced  Semiconductors  Manufacturing  Conference. 

2.  A.C.  Greenham,  B.A.  Nichols,  R.M.  Wood,  N.  Nourshargh,  L.L.  Kewis,  1993.  Optical  interference  filters 
with  continuous  refractive  index  modulations  by  microwave  plasma-assisted  chemical  vapour  deposition. 
Optical  Eng.,  32(5),  1018-1023. 

3.  J.R.  Jang,  1993.  ANFIS:  adaptive-network,  based  fuzzy  inference  system.  IEEE  transaction  on  Systems, 
Man,  and  Cybernetics,  23(3),  665-685. 

4.  H.A.  Macleod,  1986.  Thin  film  optical  filters.  Adam  Hilger  Ltd,  2nd  edition. 

5.  W.H.  Southwell,  R.L.  Hall,  1989.  Rugate  filter  side-lobe  suppression  using  quantic  and  rugated  quantic 
rugated  layers.  Appl.  Opt.,  28(4),  2949-295 1 . 


221 


A  Study  of  Mechanical  Properties  of  Multi-layered  Thin  Films 

T.  Hirasawa,  H.  Kotera,  S.Tawa,  S.  Shima 

Department  of  Mechanical  Engineering,  Kyoto  University 
Yoshidahonmachi,  Sakyou-ku,  Kyoto,  606-8501,  Japan 


ABSTRACT 

It  is  of  great  importance  to  investigate  mechanical  properties  of  a  multi-layered  thin  film  in  designing  and 
realizing  micro-electro-mechanical  systems  (MEMS).  A  new  tensile  test  method  is  proposed  to  measure 
mechanical  properties  of  a  multi-layered  thin  film,  which  are  used  in  MEMS.  In  this  method,  we  develop  a 
prefabricated  test  substrate  of  gage  portion.  After  measuring  Young's  modulus  of  prefabricated  test 
substrate,  a  thin  film  is  deposited  on  the  test  substrate  and  tensile  test  is  carried  out.  Young's  modulus  of  the 
thin  film  can  be  measured  by  subtracting  the  effect  of  Young's  modulus  of  the  prefabricated  test  substrate. 
As  an  example,  Young’s  modulus  of  a  tungsten  thin  film  deposited  by  sputtering  is  measured.  We  discuss 
the  viability  of  this  method  for  measuring  the  mechanical  properties  of  deposited  thin  film. 


INTRODUCTION 

There  have  been  a  number  of  studies  of  micro-electro-mechanical  systems  (MEMS),  such  as  pressure  micro 
sensor  [1],  micro  motor  [2],  micro  switch  [3]  etc.  In  fabrication  of  MEMS,  in  general,  materials  are 
deposited  on  other  materials  by  sputtering,  CVD  or  vapor  deposition;  the  MEMS  are  thus  composed  of 
multi-layered  thin  films.  They  must  behave  in  a  desired  manner  under  external  forces  or  support  applied 
loads  in  practice,  and  therefore  the  mechanical  properties  of  the  multi-layered  deposited  materials  should  be 
clarified  in  designing  the  MEMS  devices. 

Several  methods  have  been  proposed  to  measure  the  mechanical  properties  of  thin  films.  Tabata  et  al. 
measured  Young's  modulus  and  Poison's  ratio  of  silicon  nitride  thin  film  by  a  bulge  test  [4].  Weihs  et  al. 
measured  Young's  modulus  of  Au  and  Si02  by  a  bending  test  of  cantilever  microbeams  [5]  and  Hashimoto 
et  al.  measured  a  Co-Ta-Zr  film  by  a  three-point  bending  procedure  [6],  while  Kieseweter  et  al.  employed 
the  resonance  method  for  a  silicon  nitride  thin  film  [7], 

In  these  studies,  the  specimens  are  subjected  to  bending.  In  the  calculation,  it  is  assumed  that  Young's 
modulus  of  the  deposited  thin  film  in  tensile  direction  is  equal  to  that  in  the  compresive  direction.  However, 
it  is  common  knowledge  that  the  internal  structure  of  the  deposited  thin  film  made  by  sputtering  or  CVD  is 
granular  or  porous,  and  that  the  mechanical  properties  of  the  deposited  thin  film  may  be  different  from 
those  of  the  bulk  material.  For  example,  piezo-electric  films  such  as  PZT  or  ZnO,  are  of  an  oriented 
structure  [8],  which  results  in  highly  anisotropic  properties.  Therefore,  to  clarify  the  mechanical  properties 
of  the  thin  film,  test  methods  that  cause  a  stress  distribution  such  as  in  bending  may  be  inappropriate. 

Conventionally,  the  tensile  test  is  a  most  familiar  and  straightforward  test.  Sharpe  et  al.  and  Ando  et  al. 
proposed  an  on-chip  tensile  test  [9,10].  They  measured  Young’s  modulus  of  silicon  films  fabricated  on 
silicon  substrates.  By  their  methods,  the  gage  portion  is  fabricated  by  a  surface  micro-machining  process, 
and  then  the  material  is  subjected  to  wet  or  dry  etching.  This  can  have  an  unpredictable  effect  on  the 
mechanical  properties  of  the  specimen.  Also,  the  internal  structure  of  the  film  is  affected  by  the  fabrication 
process.  A  scatter  of  more  than  ±20%  in  their  measured  results  may  have  been  caused  by  these  factors.  It  is 
necessary  to  develop  a  test  system  in  which  the  effects  of  the  above  factors  are  eliminated. 

In  this  study,  we  propose  a  new  tensile  test  method.  A  prefabricated  test  substrate  on  which  a  thin  film  is 
deposited  is  developed.  The  concept  of  our  method  is  shown  in  Fig.l .  First,  on  a  silicon  substrate,  the  gage 
portion  is  prefabricated  by  surface  micro-machining  such  as  photolithography  and  wet-etching.  After 
fabricating  the  test  substrate,  a  tensile  test  is  performed  to  measure  Young's  modulus  of  the  test  substrate. 
Secondly,  the  thin  film  we  are  concerned  with  is  deposited  on  the  prefabricated  test  substrate.  Finally,  the 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


222 


tensile  test  is  again  performed.  Young's  modulus  of  the  thin  film  can  be  measured  by  subtracting  the 
Young's  modulus  of  the  prefabricated  test  substrate.  For  the  purpose  of  this,  we  also  developed  a  new 
tensile  test  equipment  and  a  test  substrate  holder.  As  an  example,  we  measure  Young’s  modulus  of  a 
tungsten  film  deposited  on  a  prefabricated  test  substrate  by  sputtering.  We  discuss  the  viability  of  this 
method  for  measuring  the  mechanical  properties  of  deposited  thin  films. 


<11 0> 

▲ 


Silicon  wafer 


5  ^ 


290pm 


<1 12> 


<11 1> 


(1)  Heat  (1 10)  silicon  wafer 
at  1200°Cx2hours 
SiCh :  0.9pm  thick 


=  0.9pm 
Thermal  SiCh 


(2)  Pattern  SiCh  and  etch  by  BHF 


(3)  Etch  silicon  wafer  by  KOH 


24pm  thick  gage  portion 


Fig.  1.  Concept  of  test  method. 


EXPERIMENT 

Prefabricated  Test  Substrate 

Figure  2  shows  a  schematic  drawing  of  the  prefabricated  test  substrate.  The  gage  length  was  1955pm,  the 
width  962pm  and  the  thickness  of  the  gage  portion  is  24pm.  To  fix  the  test  substrate,  a  pair  of  through 


Fig.  2.  Schematic  drawing  of  prefabricated  test  substrate. 


223 


1.  A  0.9pm  thick  thermal  Si02  is  produced  as  a  mask  by  heating  a  (1 10)  silicon  wafer  at  1200pC><2hour. 

2.  The  mask  pattern  is  set  so  that  the  gage  length  direction  is  parallel  to  the  (111)  plane,  that  is,  the  side- 
edge  surfaces  of  the  gage  portion  consist  of  (1 1 1)  planes.  Then  Si02  is  etched  by  BHF  (  NH3F  50wt%  : 
HF  49%  =10:1). 

3.  The  gage  portion  is  anisotropically  etched  by  KOH. 

The  etching  conditions  are  summarized  in  Table  1.  Figure  3.a.  shows  a  photograph  of  the  surface  etched  by 
24wt%  KOH.  The  mean  surface  roughness  is  -10p.m.  Figure  3.b.  shows  the  surface  of  the  gage  portion 
etched  by  40wt%  KOH.  In  this  case,  the  mean  roughness  was  below  2pm.  As  seen  in  these  photographs, 
surface  roughness  is  affected  by  the  etching  conditions.  So,  we  chose  40wt%  KOH  at  80°C  for  Si-etching. 

Since  the  gage  portion  of  the  test  substrate  is  rather  weak  and  is  likely  to  be  broken  during  KOH  etching, 
we  employed  a  test  substrate  holder  made  of  Polytetrafluorethylene  (PTFE)  as  shown  in  Figure  4.a.  The 
holder  is  composed  of  two  parts:  upper  holder  and  lower  holder.  The  test  substrate  was  sandwiched 
between  these  two  parts.  The  lower  holder  has  a  guide  recess  to  prevent  the  test  substrate  from  twisting  or 
bending.  There  is  a  through  hole  in  these  holders  so  that  we  are  able  to  etch  and  observe  the  gage  portion  of 
the  sandwiched  test  substrate.  Before  we  deposit  the  thin  film  on  the  prefabricated  test  substrate,  we  replace 
the  PTFE  holder  by  a  glass  one.  The  tensile  test  is  carried  out  on  the  glass  holder. 


a.  Surface  etched  by  24wt%  KOH 
Roughness :  10  pm 


b.  Surface  etched  by  40wt%  KOH 
Roughness  :  2  pm 


Fig.  3.  Photograph  of  etched  surface  of  test  substrate. 


Table  1:  Conditions  of  KOH  Etching  Process 


Solution 

24wt%  ,40wt% 

Temperature 

80°C 

etching  time 

1.5-2hours 

Tensile  Test  Equipment 

Figure  5  shows  a  schematic  drawing  of  the  test  equipment.  The  prefabricated  test  substrate  was  fixed  on  the 
dual  axial  stage  by  a  pin.  The  test  substrate  was  subjected  to  tensile  load  when  pulling  the  claw.  The 
displacement  of  the  claw  was  measured  by  a  differential  transformer;  it  was  attached  on  the  triaxial  stage  to 
align  the  claw  with  that  of  the  differential  transformer.  The  claw  part  was  pulled  through  a  9.8N  load  cell 
mounted  at  the  end  of  a  micrometer.  The  micrometer  was  controlled  by  a  reduction  gear.  The  claw  part  was 
suspended  by  a  pair  of  plate  springs.  It  was  long  enough  so  that  the  claw  part  moved  in  parallel  to  the  gage 
portion,  and  therefore,  the  tensile  load  was  not  affected  by  the  spring. 


224 


Through  hole 


M5  peek 
screw 


guide  recess 

(a)  PTFE  holder 

Through  hole 


upper 

holder 


prefabricated 
test  substrete 


lower 

holder 


prefabricated 
test  substrete 


~  — 

kkkkkkkkk 


ITriaxial  stage 

Micrometer 
. ' — Load  Cell 


Differential  - 

Tranceformer  Manometer  u 


Reduction  gear 

Stepping 
J  motor 


(a)  Top  view  of  microtensile  test  system 
Plate  spring  | 

Test  substrate  stage 


Load  Cell 
Micrometer 

Micrometer 


Dual  axial 
stage 


Deposited  material 

(b)  Glass  holder  for  thin  film  deposition 
Fig.  4.  Schematic  drawing  of  test  substrate  holder. 


ter 

I  Reduction  gear 
*  Stepping 
i  motor 


(b)  Side  view  of  microtensile  test  system 
Fig.  5.  Schematic  of  micro-tensile  test  equipment. 


RESULTS  and  DISCUSSION 

As  an  example,  we  made  an  attempt  to  measure  Young's  modulus  of  a  tungsten  thin  film  by  the  proposed 
system.  Prior  to  the  measurement  of  Young’s  modulus  of  the  deposited  material,  we  measured  the  stress- 
strain  curve  of  the  prefabricated  test  substrate.  This  is  shown  in  Figure  6.  The  relationship  between  stress 
and  strain  is  linear  and  Young’s  modulus  of  the  prefabricated  test  substrate  is  129.7GPa.  The  test  substrate 
did  not  break  up  to  250MPa. 


Fig.  6.  Typical  stress-strain  curve  for  test  substrate. 


225 


A  0.6pm  thick  tungsten  film  was  deposited  on  the  test  substrate  by  sputtering.  The  sputtering  conditions  are 
shown  in  Table  2.  After  the  tungsten  was  deposited,  a  tensile  test  was  performed  again.  The  stress/strain 
curve  is  also  liner  and  so,  Young's  modulus  was  obtained  as  132.7  GPa.  Since  the  tungsten  was  deposited 
on  the  silicon  wafer,  no  diffusion  has  occurred.  The  measured  Young’s  modulus  of  the  tungsten  film  is  246 
GPa,  while  that  of  bulk  tungsten  is  reported  as  352  GPa;  the  former  being  ~30%  lower  than  the  latter.  This 
may  be  attributed  to  the  difference  in  internal  structure,  i.e,  porosity.  Figure  7  shows  AFM  images  for  a  0.3 
pm  thick  tungsten  film  with  a  mean  surface  roughness  of  2.3  nm  and  of  a  0.6  pm  thick  tungsten  film  with  a 
mean  surface  roughness  of  4.5nm.  From  these  images,  the  structure  seems  to  be  composed  of  granular 
material;  the  grain  size  is  about  40  nm  diameter  for  the  former  film  and  about  8  nm  for  the  latter.  If  the  film 
is  porous.  Young's  modulus  is  lower  than  that  of  the  fully  dense  material.  Both  (he  mean  roughness  and  the 
grain  size  increase  with  increased  thickness  or  sputtering  time.  Since  the  internal  structure  of  the  tungsten 
film  may  grow  toward  its  thickness  direction,  the  properties  in-the  plane  direction  may  be  inferior  to  those 
of  the  isotropic  bulk  material. 


Table  2:  Sputtering  condition  for  tungsten 


RF  power 

60w 

Ar  pressure 

6.7  10'Pa 

Sputtering  rate 

0.6pm/hour 

Fig.  8.  AFM  surface  image  of  tungsten. 


In  layered  thin  films,  diffusion  may  occur  affecting  the  mechanical  properties  as  shown  in  our  previous 
work  [11],  Furthermore,  it  is  well-known  that  the  internal  structure  of  the  thin  film  is  affected  by  its 
substrate  or  by  a  pre-deposited  film.  In  the  proposed  test  method,  if  we  repeat  the  tensile  test  and  deposition 
of  the  thin  film  alternately,  then  we  can  accurately  measure  Young's  modulus  of  each  film  and  of  the  multi¬ 
layered  film.  The  effects  of  the  above  factors  on  the  mechanical  properties  of  a  single  layered  or  multi¬ 
layered  film  can  thus  be  investigated. 


CONCLUSION 

A  method  to  measure  Young’s  modulus  of  thin  films  using  a  prefabricated  test  substrate  was  proposed.  To 
confirm  the  viability  of  the  method,  a  tungsten  film  deposited  on  the  test  substrate  was  measured.  We 
therefore  have  shown  the  potential  of  this  method  to  measure  precisely  the  mechanical  properties  of  thin 
films. 


226 


ACKNOWLEDGEMENT 

This  work  was  supported  by  a  Grant-in-Aid  for  Scientific  Research  (C)(No.  10650258)  by  the  Ministry  of 
Education,  Science,  Sports  and  Culture. 


REFERENCES 

1 .  Mastrangelo,  C.H.  et  al.,  1996.  Surface-Micromachined  Capacitive  Differential  Pressure  Sensor  with 
Lithographically  Defined  Silicon  Diaphragm.  Journal  of  Micromechanical  Systems,  5(2),  98-105. 

2.  Tai,  Y.  et  al.,  1989.  IC-processed  Electrostatic  Synchronous  Micromotors.  Sensors  &  Actuators,  20,  49-55. 

3.  Zavracky,  P.M.  et  al.,  1997,  Micromechanical  Switches  Fabricated  Using  Nickel  Surface  Micro 
machining.  Journal  of  Microelectromechanical  Systems,  6(1),  3-9. 

4.  Tabata,  O.  et  al.,  1989.  Mechanical  Property  Measurements  of  Thin  Films  Using  Load-Deflection  of 
Composite  Rectangular  Membranes.  Sensors  and  Actuators,  20,  135-141 . 

5.  Weihs  T.P.  et  al.,  1988.  Mechanical  deflection  of  cantilever  microbeams:  A  new  technique  for  testing  the 
mechanical  properties  of  thin  films.  J, Mater, Res.,  3(5)  Sep/Oct,  931-942. 

6.  Hashimoto,  K.  et  al.,  1994.  Development  of  Precision  Three-Points  Bending  Machine  for  Measuring 
Young's  Modulus  of  Thin  Films  for  Electronic  Devices.  J.  Soc.  Mat.  Sci.  Japan,  43(489),  703-709. 

7.  Kiesewetter  L.  et  al.,  1992.  Determination  of  Young’s  moduli  of  micromechanical  thin  films  using  the 
resonance  method.  Sensors  and  Actuators  A,  35,  153-159. 

8.  Kotera,  H.,  1999.  Piezoelectric  Property  of  CVD  ZnO  Film  for  Pressure  Micro  Sensor.  Advances  in 
Information  Storage  Systems  (AISS),  9,  in  press. 

9.  Sharpe,  W.N.  et  al.,  1997.  Measurements  of  Young's  Modulus,  Poisson's  ratio,  and  Tensile  Strength  of 
Polysilicon.  Journal  of  Microelectromechanical  Systems,  6(3),  193-199. 

10.  Ando  Taeko  et  al.,  1999.  Measurement  of  Stress  and  Stran  of  Single-Crystal-Silicon  Thin  Film  during 
On-Chip  Tensile  Test.  T.  IEE  Japan,  1 19-E(2),  67-72. 

1  l.Hirasara,  T.  et  al.,  1999.  A  study  of  the  effect  of  the  fabrication  process  on  the  diffusion  in  a  layered 
thin  film.  Microsystem  Technologies  26,  in  press. 


227 


Fast  3D  -  Surface  Quality  Control 

Jurgen  Leopold 


Society  for  Production  Engineering  and  Development 
Department  for  Calculation  and  Testing 
Lassallestr.  14 

D  -  091 17  Chemnitz;  Germany 
Email:  100536.1232@compuserve.com 


ABSTRACT 

Inspection  is  the  process  of  determining  if  a  product  deviates  from  a  given  set  of  specifications.  Fast 
Inspection  usually  involves  measurement  of  specific  part  features  such  as  assembly  integrity,  geometric 
dimensions  and  surface  finish.  It  is  a  quality  control  task,  but  is  distinguished  from  testing  tasks.  The  visual 
inspection  of  3D  -  parts  (bulk  materials  and  also  thin-film)  is  a  special  task  within  manufacturing  that  has 
been  automated  at  a  comparatively  slow  pace  up  to  this  time.  Different  optical  methods,  applied  for  3D  - 
Measuring  of  microscopically  and  macroscopically  bulk  parts  and  thin-films  based  on  the  Laser  Scanning 
Technique  (LST),  the  Projected  Fringes  Method  (PFM),  the  Electronic  Speckle  Interferometiy  (ESPI), 
White-Light-Interferometry  (WLI)  and  also  on  SEM-  and  AFM-  Methods.  In  addition,  stylus  instruments 
and  also  colorimeter  measurements  will  be  included.  The  physical  characterization  of  the  surface 
topography  is  the  scientific  basis  for  the  next  step,  the  intelligent  interpretation.  Advantages  and 
disadvantages  of  different  contemporary  intelligent  methods  :  Neural  Net  based  Classification;  Fuzzy  - 
Clustering;  Fractal  Analysis  and  some  industrial  applications  will  be  discussed. 

Keywords 

Surface  Characterization,  Bulk  and  Thin-Film  Inspection,  Laser  Scanning,  Projected  Fringes  Method, 
Photogrammetric  Method,  Electronic  Speckle  Pattern  Interferometry,  AFM,  SEM,  Fractal  Analysis; 

Neural  Net 


INTRODUCTION 

In  different  companies  Quality  Assurance  policy  is  established  in  order  to  guarantee  the  best  quality 
products  and  technical  support  for  its  clients.  In  this  way,  in-process  inspections  must  cover  all  levels  of 
production  to  meet  the  requirements  of  ISO  9000. 

The  surface  of  a  solid  is  that  part  of  the  solid  that  represents  the  boundaries  between  the  solid  body  and  its 
environment.  Surfaces  as  physical  entities  possess  many  attributes,  geometry  being  one  of  them.  Surface 
geometry  by  nature  is  three-dimensional  and  the  detailed  features  are  termed  topography.  In  engineering, 
topography  represents  the  main  external  features  of  a  surface. 

In  practice,  the  notion  of  a  surface  extends  to  sub-layers  of  solid  boundaries  and  the  surface  assumes  certain 
internal  features.  These  internal  features,  e.g.  hardness,  residual  stress,  deformation,  chemical  composition 
and  reactions,  microstructure,  are  often  of  foremost  concern  in  an  application  and  surface  topography  often 
interrelates  with  these  features,  in  complicated  manners  and  in  three  dimensions,  to  manifest  certain 
engineering  properties.  Surface  topography  is,  therefore,  significant  for  surface  performance  and  the 
importance  of  surface  topography  measurement  as  a  means  of  functional  analysis  and  prediction  is 
indisputable.  Engineering  surfaces  are  created  in  various  way,  typically  by  machining,  surface  treatment 
and  coating.  Surface  topography  modification  is  therefore  performed  by  material  removal,  transformation 
or  addition.  Most  often  a  combination  of  various  machining,  treatment  and  coating  operations  are  employed 
to  produce  surfaces  with  characteristics  that  are  desirable  for  a  particular  application.  Each  surface 
generation  process  produces  surface  topography  characteristic  of  the  process  and  process  variables  used. 
Surface  topography,  therefore,  contains  signatures  of  the  surface  generation  process  and  as  such  can  be 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


228 


used  to  diagnose,  monitor  and  control  the  manufacturing  process.  In  an  engineering  sense,  the  ultimate 
objective  of  surface  topography  measurement,  as  a  means  of  control  and  knowledge,  is  to  establish  a 
correspondence  between  an  engineering  surface  phenomenon  (e.g.  wear,  chatter,  etc.)  and  its  topographical 
characteristics  (e.g.  bearing  area  and  oil  retention  volume,  waviness  power  and  periodicity,  etc.).  Surface 
topography  measurement,  therefore;  serves  as  a  vital  link  between  manufacturing,  functional  performance 
analysis  and  prediction,  and  surface  design. 

The  relationships  between  surface  design,  function,  manufacturing  and  assessment  based  on  the  measuring 
techniques,  the  physical  characterization  and  intelligent  interpretation  are  schematically  shown  in  Fig.  1  .It 
is  well  known,  that  the  geometrical  topography  of  tools  and  sheet  metal  parts  depends  on  its  manufacturing 
conditions.  On  the  other  hand  the  functional  performance  of  the  3D-geometry  and  the  quality  is  strongly 
related  to  the  geometrical  characteristics  of  the  surface  topography.  Additional  the  demands  of  high- 
productivity  manufacturing  require  that  advanced  process  control  balance  between  functional  properties, 
3D-topography  and  the  surface  characterization.  There  are  different  optical,  non-contact  methods  to 
investigate  surface  structures  and  morphologiesf  1  ]. 


Fig.  1 :  Foundations  of  a  Intelligent  Surface  Characterization 


SPECKLE  MEASUREMENT 

Two  speckle  patterns  of  the  same  surface  but  recorded  at  different  laser  illumination  angles  are  correlated. 
The  degree  of  the  correlation  depends  on  the  surface  roughness  and  is  expressed  by  the  correlation 
coefficient  pI;2  that  is  given  by  the  relation  of  the  covariance  of  the  intensities  in  the  two  speckle  patterns 
and  the  variance.  Now  it  is  this  correlation  coefficient  that  is  related  to  the  surface  roughness.  The  angular 
correlation  of  the  speckle  phase  images  is  used  to  get  the  desired  3D  information  of  the  investigated  object 
surface. 

Butters  and  Leendertz  [1]  developed  the  Electronic  Speckle  Pattern  Interferometry  (ESP1)  as  a  non¬ 
destructive  testing  technique  to  determine  the  full  displacement  field  of  a  diffuse  reflecting  object  in  1969  . 
But  it  was  1984  when  the  introduction  of  phase-shifting  techniques  in  speckle  interferometry  automated  the 
quantitative  analysis  of  the  speckle  fringe  pattern  .  During  the  last  ten  years,  the  progress  in  computer 
technology  and  image  processing  has  made  speckle  interferometry  additionally  more  and  more  attractive 
for  research  work  as  well  as  for  the  first  industrial  applications. 

Whereas  in  holographic  interferometry  two  electromagnetic  waves  interfere,  the  interference  mechanism  in 
the  similar  ESPI  is  different.  Here  two  laser  waves  illuminate  a  surface  of  interest,  and  two  speckle  fields 
are  produced  in  space  by  the  well  known  speckle  effect  of  the  optically  rough  surface  and  Huygens’  s 
principle  to  interfere  with  each  other. 


229 


There  are  mainly  three  different  techniques  to  measure  the  3D-shape  or  contour  of  an  object  in  speckle 
interferometry:  the  angular,  spectral  and  refractive  speckle  contouring  techniques.  In  the  angular  speckle 
contouring  technique,  the  illumination  geometry,  that  means  the  object  illumination  angles,  are  changed 
between  two  subsequent  phase  image  recordings  .Butters,  Jones  and  Wykes  have  carried  out  the  initial 
research  work  of  contouring  the  optically  rough  surface  of  an  object  using  the  dual  illumination  approach  . 
A  ESPI  contouring  technique  has  been  proposed  first  by  S.  Winther  and  Slettemoen  in  1984  .The  spectral 
speckle  contouring  technique  uses  two  different  laser  wavelengths  between  two  subsequent  phase  image 
recordings  ,  whereas  the  surface  of  interest  can  also  be  illuminated  by  two  waves  of  different  wavelengths 
simultaneously,  and  only  one  phase  image  is  recorded. 

In  the  refractive  speckle  contouring  technique,  the  refractive  index  of  the  medium  between  CCD  chip  and 
object  surface  has  to  be  changed  between  the  two  needed  recordings  of  the  phase  fringe  patterns  .But  also 
the  speckle  shearing  interferometry  has  been  used  to  determine  the  slope  and  shape  of  a  surface.  Ganesan 
and  Sirohi  proposed  a  new  method  of  speckle  contouring  using  digital  speckle  pattern  interferometry 
(DSPI)  in  1988  to  get  the  3D  contour  of  an  object  in  real-time  .  The  technique  makes  use  of  DSPI  with  an 
in-plane  sensitive  optical  set-up.  The  main  difference  of  Ganesan  fe  and  Sirohi  method  to  ours  is  that  the 
object  is  tilted  whereas  in  our  contouring  method  only  the  illuminating  angles  are  changed,  and  the  object 
surface  remains  fixed.  Furthermore  they  do  not  use  the  incremental  addition  of  phase  images.  The 
usefulness  of  the  contouring  technique  in  ESPI  is  demonstrated,  for  example,  in  combination  with 
displacement  field  measurements  of  curved  surfaces  to  be  able  to  identify  the  displacement  vector  with  the 
corresponding  surface  point  which  is  impossible  if  the  shape  of  the  surface  is  unknown.  In  one  of  the  first 
practical  applications,  Paoletti  and  Spagnolo  demonstrate  a  speckle  contouring  technique  by  DSPI  for 
surface  inspection,  where  they  developed  an  interferometer  for  the  investigation  of  surface  defects  in 
deterioration  studies  in  1991  . 


EXPERIMENTAL  SET-UP  FOR 

ANGULAR  SPECKLE  CONTOURING  MEASUREMENT 

The  optical  set-up  of  the  two-beam  angular  speckle  contouring  interferometer  shown  in  Fig.  1  consists  of 
an  optical  head,  containing  a  laser  diode,  a  phase-shifting  device  and  a  standard  CCD  camera,  a  driver  unit 
for  diode,  shutter  and  CCD  camera,  a  VME-bus  for  data  transfer,  a  PC  for  image  processing,  a  TV  monitor 
for  real-time  control,  two  plane  mirrors  on  turntables  with  scanning  motors  and  a  corresponding  driver  unit. 


The  laser  diode  has  a  power  of  32  mW  and  produces  light  with  a  wavelength  of  about  828  nm  in  the  near 
infrared.  The  phase-shifting  device  is  a  piezo-transducer  element  (PZT).  The  camera  produces  512  x  512  x 
8  bit  images  [2], 


Spherical 

Mirror 

/?■  . 

•  JMirrar  ti 


IMAGE- 
PRO  CESSING 


ESPI -Head 


Spherical 

Mirror 

tuma^lie 


Mirror  turns! 

-©-  . 


Plane 

Mirror 


Plane 

Mirror 


%  ecinten 


Fig.  1.  Angular  speckle  contouring  set-up 


230 


Speckle  contouring  measurements  and  according  roughness  evaluations  have  been  performed  on  milled 
Rugo  test  surfaces  [3].  The  measurement  range  is  characterized  by  an  average  mean  roughness  Ra  =  12.5 
pm  down  to  1.6  pm.  Here  it  is  clearly  demonstrated  that  the  used  speckle  interferometric  technique  resolves 
surface  structures  in  the  pm-range  with  the  advantage  of  a  full-field  method.  (Fig.  2  to  Fig.4  ) 


Fig.  2.  Phase-map  of  the  surface 


Fig.  3.  Pseudo  3D-  map 


Roughness  Ra  ||iin] 
average  r  oughness  :  Ra  =  1.61  |im 


1  51  101  151  201  251  301  351 

row  number  [a.(i:] 


Fig.  4.  Roughness  evaluation 


Table  1  gives  the  measured  and  averaged  speckle  roughness  values  Ra  in  comparison  with  stylus 
profilometric  reference  results.  The  ID  profilometric  reference  measurements  were  carried  out  with  a 
Talystep  profilometer  with  diamond  stylus.  To  get  a  comparable  speckle  value,  Ra  has  been  averaged  over 
all  measured  rows.  The  profile  lengths  were  chosen  according  to  the  ISO-normed  cut-off  wavelengths. 


Table  :  Comparison  of  average  mean  roughness 


Raf 

um| 

speckle  contouring 

stylus  reference 

12.75 

12.50 

6.25 

6.30 

3.07 

3.20 

1.61 

1.60 

NEW  SYSTEM  FOR  FAST  3D-SURFACE  CHARACTERIZATION 

The  new  developed  system  for  fast  3D  surface  characterization  is  based  on  Figures  5  and  6.  Using 
projected  fringe  methods,  angular  speckle  correlation  techniques,  white  light  interferometry  and  atomic 
force  microscopy,  the  3D-structure  of  an  technical  surface  (bulk  or  thin  layers)  may  be  determined.  The 
software  tool  Layers  (Fig.  5)  was  developed  to  characterize  the  material  structure  of  the  surfaces.  Based 
on  Intelligent  Methods  (Fast  Fourier  Transformation;  Neural  Nets;  Fractal  Analysis,  Wavelet  Transforms 
and  Power  Spectra)  an  real-time  system  for  bulk  and  thin-film  inspection  of  surfaces  has  been  developed 
(Fig.  6.).  Applications  for  sheet  metal  parts  and  surface  characterizations  of  surfaces  will  be  discussed. 

ACKNOWLEDGMENT 

The  investigations  were  partly  sponsored  by  the  German  Research  Council,  the  German  Ministry  of 
Economy  and  the  Foundation  for  Promotion  of  Advanced  Automation  Technology  /  Japan. 

REFERENCES 

1.  J.Leopold,  H.Gtinther,  R.Leopold,  1998.  Metrology  Based  Surface  Quality  Control.  Proceedings  6th 
ISMQC  IMEKO  Symp., Metrology  for  Quality  Control  in  Production,  Vienna/Austria;  401-408 

2.  J.Leopold,  H.Gtinther,  1998.  Fast  Characterization  of  Glossy  Surfaces  by  Means  of  Coherent  Radiation; 
Proc.  3  rd.  Inter.  KongreB  und  Fachausstellung  fur  Optische  Sensorik,  MeBtechnik  &  Elektrotechnik, 
Erfurt,  18.-20.  Mai,  179-184 

3.  J.Leopold,  M.  Hertwig,  H.  Gunther,  B.  Staeger,  1996.  3D-Measurement  of  Macro-  and  Microdomains 
using  optical  methods;  Proceedings  of  the  IX.  Int.  Oberflachenkolloquium,  Chemnitz, Germany 


231 


Projection 
Fringes  Technique 


White  -  Light 
Interferometry 


+2.0000 


-2.0000 

2.999 


3D  -  Characterization 


Fast  -  Characterization 


Fig.  5  :  Experimental  methods  for  Fast  3D  -  Surface  Characterization 


232 


CHARACTERIZATION  OF  SURFACES 


3D  FAST 


0,00  0,20  0,40  0,60  0,80  1,00 


Fast  -  Fourier  - 
Transformation 
Fractal  Dimension 
Gloss  Measurement 
Wavelet  -  Parameter 
Power  -  Spectra 


leopold  /  07.02.1999 


Fig.  6  :  Results  of  different  methods  for  surface  quality  control. 


233 


A  Study  on  a  Novel  Smoothing  Method  by  Atomic  Layer  Epitaxy 

for  Microstructure  Fabrication 

S.  Hirose*.  A.  Yoshida**,  M.  Yamaura**  and  H.  Munekata** 

*Mechanical  Engineering  Laboratory,  AIST,  MITI, 

1-2  Namiki,  Tsukuba,  Ibaraki  305-8564,  Japan 
**Imaging  Science  and  Engineering  Laboratory,  Tokyo  Institute  of  Technology, 
4259  Nagatsuda,  Midori-ku,  Yokohama  226-8503,  Japan 


ABSTRACT 

We  prove  that  atomic  layer  epitaxy  (ALE)  provides  a  novel  technique  to  smooth  a  relatively  rough  GaAs 
surface.  The  method  has  been  applied  successfully  to  smooth  chemically  etched  V-grooved  GaAs 
structures  and  selectively  grown  GaAs  stripe  structures  The  key  advantage  is  that  ALE  is  governed  by  two- 
dimensional  island  growth  mode. 


INTRODUCTION 

Microstructure  formation  of  semiconductor  devices  has  been  developed  primarily  using  photolithography 
and  etching  [1],  Surface  roughness,  being  unavoidable  in  the  nano-/micro-structure  processing,  is  a  key 
problem  which  leads  to  degradation  in  physical  properties  and  device  performance.  One  typical  example  is 
that  the  edge  roughness  of  mask  layers  formed  by  the  etching  process  results  in  striations  on  side  facets. 
Smoothing  of  damaged  surfaces  is  required  for  fabrication  of  high  quality  advanced  semiconductor 
microstructures.  This  paper  investigates  the  application  of  ALE  to  the  final  smoothing  process. 


THE  CONCEPT  OF  SURFACE  SMOOTHING 

ALE  of  GaAs  has  been  studied  for  the  last  decade  and  is  expected  to  realize  nano-structures  with  thickness 
controllability  at  the  monolayer  (ML)  level.  The  ALE  growth  is  controlled  at  just  1-ML  per  source  supply 
cycle  by  supplying  trimethyl  gallium  [(CH3)3Ga]  and  arsine  [AsH3]  alternatively,  which  means  that  alkyl 
radicals  on  the  surface  prevent  further  Ga  adsorption.  The  advantages  of  ALE  are:  (l)  self-limiting  growth 
[2],  (ii)  good  thickness  controllability  and  uniformity  [3],  and  (iii)  excellent  selectivity  on  different  surfaces 
[4],  We  have  found  an  additional  specific  feature  of  ALE  in  this  study  in  that  the  ALE  growth  is  dominated 
by  a  two-dimensional  island  growth  mode.  We  have  inferred  that  these  qualities  may  be  applicable  to  the 
smoothing  of  damaged  surfaces  of  microstructures  without  significantly  changing  the  shape  and  size  of  the 
original  structures. 

Rough  Surface 
X  X 


^  Growth  by  ALE  Mode 


-epi.  Layer 


Growth  by  ALE  Mode 


Fig.  1.  Schematic  illustration  showing  the  method  to  smooth  the  rough  surface  by  using  ALE  growth  mode. 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


234 


The  concept  of  the  surface  smoothing  method  is  schematically  illustrated  in  Fig.  1.  For  simplicity,  let  us 
assume  that  the  rough  surface  consists  of  x  and  y  planes.  We  then  choose  the  specific  ALE  growth 
conditions  such  that  the  growth  rate  on  the  x  surface  is  much  lower  than  that  on  the  y  surface.  Then,  the 
growth  on  the  x  surface  proceeds  very  slowly  relative  to  that  on  they  surface,  automatically  creating  a  flatx 
surface.  The  growth  on  the  x  surface  is  improved  by  the  formation  of  self-limiting  two-dimensional  (2D) 
formation.  When  the  ALE  growth  process  is  completed,  the  result  will  be  a  smooth  surface. 


SMOOTHING  FOR  V-GROOVED  GaAs  STRUCTURE 

We  will  first  describe  an  experiment  to  smooth  the  (111 ) A  surfaces  in  V-grooved  GaAs  structures.  The 
patterning  of  V-grooved  structures  were  formed  by  chemical  vapor  deposition  (CVD)  with  the  method  of 
Si02  mask  and  photolithography.  Figure  2  (a)  shows  the  bottom  part  of  the  as-etched  structures  in  which 
we  can  clearly  observe  a  number  of  etch-pits  and  line-shaped  striations  on  the  (111 ) A  side  walls.  The 
striations  are  attributed  to  edge  roughness  of  a  patterned  conventional  photoresist  polymer  layer. 


Fig.  2.  SEM  images  of  V-grooved  GaAs  surfaces  after 
(a)  wet  chemical  etching,  (b)  growth  by  MOVPE,  and  (c)  growth  by  ALE. 


Figure  2  (b)  shows  the  bottom  part  of  the  V-groove  after  growing  a  thin  GaAs  layer  by  conventional  metal 
organic  vapor  phase  epitaxy  (MOVPE)  process.  To  make  a  good  comparison  between  MOVPE  and  ALE, 
the  total  layer  thickness  on  the  (001)  GaAs  surface  was  kept  constant  at  200  nm.  When  MOVPE  growth 
was  performed  to  smooth  the  V-grooved  surface,  the  initial  surface  striations  were  emphasized,  reflecting 
the  unevenness  of  under  layer  (Fig.  2  (b)).  This  is  likely  a  natural  consequence  of  the  step-flow  mode  in  the 
MOVPE  process  (Fig.  3  (a)).  Another  problem  in  MOVPE  sample  is  that  the  sharp  bottom  profile  of  the  V- 
groove  is  rounded,  which  signifies  variations  in  surface  orientation. 


In  contrast  to  the  results  obtained  by  MOVPE,  the  smoothness  of  surfaces  of  V-grooved  structures  is 
clearly  improved  through  application  of  the  ALE  process,  as  shown  in  Fig.  2  (c).  Figure  4  shows  the 
dependence  of  the  GaAs  growth  rate  on  substrate  temperature  (Ts)  for  (001)  and  (111 ) A  surfaces.  The  ALE 
conditions  used  for  this  experiment  were  Ts  =  480  °C,  TMG  and  AsH3  supplied  at  1.5x10''  umol/s  and  3.0x 
101  pmol/s  for  3  and  10  s,  respectively,  with  3  s  of  hydrogen  purge  between  each  source  supply.  The  ALE 
growth  rate  on  the  (001)  plane  saturates  at  1-ML  per  cycle,  indicating  self-limiting  behavior.  On  the  other 
hand,  the  growth  rate  on  the  ( 1 1 1  )A  plane  is  relatively  slow.  On  the  basis  of  these  data,  we  have  chosen  the 
specific  substrate  temperature  Ts  =  480  °C. 


235 


In  Fig.  2  (c),  we  can  clearly  see  that  the  ridges  appearing  in  Fig.  2  (a)  vanish  almost  completely  after  ALE 
processing.  Also,  a  (001)  plane  is  spontaneously  developed  at  the  bottom  of  the  V-groove.  Furthermore, 
sharp  edges  are  clearly  developed  at  the  intersection  of  the  (001)  and  (111 ) A  surfaces.  This  is  probably 
explained  in  terms  of  the  difference  in  the  nucleation  mechanism  between  MOVPE  and  ALE.  Atomic  force 
microscopy  (AFM)  observations  have  verified  that  ALE  is  primarily  driven  by  formation  of  2D  islands 
(Fig.  3  (b)).  We  have  concluded  that  the  ALE  growth  process  achieves  smooth  GaAs  surfaces. 


(a)  MOVPE 


(b)  ALE 


Fig.  3.  AFM  plan- view  image  of  the  GaAs  on  the  (001)  surface  grown  by  (a)  MOVPE  and  (b)  ALE. 


(a) 


(b) 


10  tim 


SUBSTRATE  TEMPERATURE  (°C) 


Fig.  4.  (a)  Growth  rates  on  (001)  and  (111 ) A  GaAs  surfaces  as  a  function  of  substrate  temperature, 
(b)  Schematic  illustration  of  a  V-grooved  structure. 


SMOOTHING  FOR  RIDGE  GaAs  STRUCTURE 

We  then  applied  ALE  to  smooth  the  ridge  structure,  containing  (110)  and  (111  )B  surfaces,  which  was 
grown  selectively  by  conventional  MOVPE  at  Ts  =  700  °C.  As  can  be  seen  in  Fig  .5  (a),  many  wavy 
striations  exist  on  the  (110)  surface,  and  moreover,  the  (111  )B  surface  was  not  flat,  having  irregular  holes 
particularly  at  the  top  of  the  ridge. 

In  order  to  eliminate  these  defects,  we  applied  ALE  to  smooth  the  (110)  surface  in  which  the  growth  rate  on 
the  (110)  surface  is  relatively  slower  than  on  the  other  surfaces.  Figure  6  shows  the  dependence  of  the 
GaAs  growth  rate  on  Ts  for  (001),  (11 1)B  and  (110)  surfaces.  Ts  =  480  °C  satisfies  this  condition  in  that  1- 
ML  self-limiting  ALE  growth  mode  is  almost  realized  on  the  (001)  surface  while  the  growth  rate  on  the 
(1 10)  surface  is  less  than  1  A  per  cycle.  The  growth  rate  on  the  (1 1 1)B  plane  is  inbetween  those  on  the 
(001)  and  (110)  surfaces. 

After  applying  the  ALE  smoothing  process,  the  wavy  surface  features  on  the  (110)  surface,  formed 
originally  by  the  MOVPE  process,  are  almost  repaired  as  shown  in  Fig.  5  (b).  Also,  (111  )B  and  (001) 
surfaces  have  a  degree  of  smoothness  equivalent  to  surfaces  formed  by  crystal  cleavage.  However,  the 


236 


inner  (111  )B  plane  still  remains  somewhat  rough.  We  infer  that  the  roughness  of  these  defective  structures 
were  too  large  to  smooth  out  completely  by  a  single  smoothing  process.  Applying  another  ALE  process 
with  different  conditions  would  result  in  further  improvements  in  the  surface  roughness. 


Fig.  5.  (a)  SEM  image  of  a  side  wall  of  a  GaAs  stripe  structure  grown  by  selective  area  MOVPE. 
(b)  SEM  image  of  the  structure  after  ALE  smoothing  method. 


SUBSTRATE  TEMPERATURE  (°C) 

Fig.  6.  (a)  Growth  rates  on  (001),  (11 1)B  and  (110)  GaAs  surfaces  as  a  function  of  substrate  temperature 

(b)  Schematic  illustration  of  a  ridge  structure. 


CONCLUSION 

We  have  proposed  and  demonstrated  a  novel  method  to  smooth  the  surface  of  GaAs  microstructures  using 
atomic  layer  epitaxy  (ALE)  and  keeping  film  thickness  controllability  at  the  monolayer  level.  It  has  been 
demonstrated  that  ALE  makes  it  possible  to  form  surfaces  with  smoothness  equivalent  to  those  formed  by 
crystal  cleavage  because  the  growth  is  characterized  by  2D-island  nucleation. 

This  result  is  much  superior  to  that  from  metal  organic  vapor  phase  epitaxy  wich  is  a  more  traditional 
processing  technique.  When  MOVPE  growth  was  performed,  initial  surface  striations  became  emphasized, 
reflecting  the  unevenness  of  the  under  layer,  a  natural  consequence  of  the  step-flow  mode  in  the  MOVPE 
process. ).  Variations  in  surface  orientation  is  also  a  significant  problem  with  MOVPE  processing. 


237 


ACKNOWLEDGEMENT 

The  authors  gratefully  acknowledge  Taiyo-Toyo  Sanso  Co.,  Ltd  for  supplying  highly  pure  Aslf  gas.  This 
work  was  partially  supported  by  Research  Fellowships  of  the  Japan  Society  for  the  Promotion  of  Science 
for  Young  Scientists. 


REFERENCES 

1.  R.  Bhat,  E.  Kapon,  S.  Simhony,  E.  Colas,  D.  M.  Hwang,  N.  G.  Stoffel,  M.  A.  Koza,  1991.  J.  Cryst. 
Growth  107,716-721. 

2.  T.  Suntola,  1994.  Handbook  of  Crystal  Growth,  ed.  D.T.J.  Hurle  (Elsevier,  Tokyo,)  3,  Ch.  14,  601. 

3.  S.  Hirose,  N.  Kano,  K.  Hara,  and  H.  Munekata,  1997.  J.  Crystal  Growth  172,13-17. 

4.  H.  Isshiki,  Y.  Aoyagi,  T.  Sugano,  S.  Iwai,  T.  Meguro,  1993.  Appl.  Phys.  Lett.  63  1528-1530. 


238 


239 


Relationship  between  groove  cross-sectional  area  per  pulse  of  YAG  laser 

and  strength  of  processing  sound 

Tsuneo  Kurita*,  Tomohiko  Ono**,  Tsuyoshi  Nakai**,  Noboru  Morita*** 

*  Japan  Science  and  Technology  Corporation 
MEL,  1-2  Namiki,  Tsukuha,  Ibaraki,  JAPAN 
**  Tokyo  Metropolitan  Institute  of  Technology, 

6-6  Asahigaoka,  Hino,  Tokyo,  JAPAN 
***Faculty  of  Engineering,  Chiba  University, 

1-33  Yayoi-cho,  Inage-ku,  Chiba,  JAPAN 


ABSTRACT 

Laser  processing  is  an  important  manufacturing  technology  in  machining  difficult-to-cut  materials.  A  sound  generates  when 
a  laser  processing  is  carried  out,  and  the  intensity  of  a  sound  changes  according  to  processing  conditions.  The  goal  of  the 
research  is  to  construct  a  laser  processing  system  for  the  manufacture  of  free  form  surface.  In  order  to  achieve  this  goal,  this 
study  aims  to  clarity  the  relationship  between  the  strength  of  laser  processing  sound  and  groove  cross-sectional  area  per  pulse 
when  Q-switched  YAG  laser  beam  was  applied  for  laser  grooving. 


INTRODUCTION 

Laser  processing  is  characterized  that  various  types  of  material  such  as  hard  metals  and  ceramics  can  be  processed  in  a  high 
speed  condition,  because  it  is  possible  to  concentrate  an  input  laser  energy  to  a  small  area.  On  the  other  hand,  heating,  melting 
and  vaporization  of  material  occur  continuously  when  a  high  power  laser  energy  is  applied,  and  such  phenomena  make 
difficult  to  monitor  laser  processing  characteristics.  In  the  case  of  a  material  removal  process,  for  instance,  it  is  very 
difficult  to  keep  a  removal  depth  of  a  material  and  a  processed  accuracy  at  a  constant  level  without  applying  a  laser  process 
control.  When  considering  these  technological  backgrounds,  it  may  be  necessary  to  realize  a  high  accuracy  control  system 
for  a  laser  processing,  in  which  a  processing  information  is  monitored  and  is  transferred  to  a  laser  machine  controller  to 
construct  a  feed  back  system.  In  order  to  achieve  these  technologies,  the  utilization  of  the  laser  processing  sound  has  been 
tried  [l]-[6].  The  final  goal  of  the  research  is  to  construct  a  laser  processing  system  for  the  manufacture  of  free  form  surface 
or  stepped  groove.  In  order  to  control  the  laser  processing  characteristic,  various  types  of  condition  should  be  manipulated. 
For  the  monitoring  of  the  characteristics  of  laser  processing  by  utilizing  the  generated  sound,  it  is  very  important  to  clarify  the 
fundamental  relationship  between  laser  processing  sound  and  processing  characteristics  which  is  not  affected  by  laser 
processing  conditions.  The  authors  have  clarified  the  relationship  between  the  cross-sectional  area  of  processed  hole  and 
sound  pressure  level  at  the  specified  frequency  experimentally,  and  it  may  be  possible  to  express  both  relationship  by  the 
straight  line  when  a  low  frequency  laser  beam  is  used  [7],  The  purposes  of  the  research  are  to  clarify  the  accurate 
relationship  between  material  removal  cross-sectional  area  per  laser  pulse  and  the  strength  of  processed  sound  at  various 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


240 


processing  conditions,  and  also  to  clarify  the  effect  of  work  material  on  the  both  relationship  when  a  high  frequency  Q- 
switched  YAG  laser  beam  is  applied. 

EXPERIMENTAL  SETUP 

Fig.  1  shows  the  experimental  setup  for  the  sensing  and 
for  the  frequency  analysis  of  a  laser  processing  sound, 

A  work  material  was  set  on  the  numerically  controlled 
table,  and  a  TEMoo  single  mode  Q-switched  YAG 
laser  beam  (500Hz~  50kHz)  was  irradiated  to  the  pre¬ 
determined  position.  Generated  sound  was  detected 
by  a  condenser  microphone  which  was  set  at  the 
distance  of  40  mm  from  the  incident  position  of  a  laser 
beam  with  the  inclination  of  30  deg.,  and  was  recorded 
by  a  DAT  (Digital  Audio  Tape)  data  recorder  through 
an  amplifier.  The  strength  of  detected  sound  versus 
processing  time  was  measured  by  using  an  FFT.  The 
photos  of  cross-sectional  area  of  processed  groove  were 
taken  by  SEM  (Scanning  Electron  Microscope)  in  order 
to  measure  the  cross-sectional  area  of  grooves  after  groove  was  ground  by  #1000  diamond  grinding  wheel. 

EXPERIMENTAL  RESULTS 

The  measurement  of  strength  of  processing  sound 

Fig.2  shows  the  relationship  between  groove  volume  increase  rate  (mm3/s)  and  sound  pressure  level  at  specified  frequencies 
of  processing  sound.  Where,  work  material  was  (Al203+TiC)  ceramic,  Q-switch  frequency  was  2kHz,  laser  energy  was 

l.lOJ/s  and  processing  speeds  were  from  0.01  to  4.0mm/s.  It 
was  cleared  that  the  relationships  of  two  variables  at  frequencies 
of  4  steps  could  be  expressed  by  straight  lines.  This 
experimental  results  show  that  it  is  not  necessary  to  select  a 
specified  frequency  in  order  to  express  the  relationship  with 
material  removal  volume  and  the  strength  of  processing  sound. 
Fig.3  shows  the  variation  pattern  of  the  strength  of  processing 
sound  versus  processing  time  at  lmm/s  processing  speed,  3kHz 
Q-switch  frequency  and  1.40J/s  laser  energy.  In  the 
experiments,  (Al203+TiC)  ceramic  (a)  and  a  (WC+Co)  sintered 
carbide  (b)  were  selected  as  work  materials.  As  shown  in  Fig.3, 
signals  having  the  same  pattern  were  detected  repeatedly  when  a 
high  frequency  Q-switched  YAG  laser  beam  was  used  for  a 
laser  grooving.  It  can  be  assumed  that  these  signals  coincide 
with  incidence  of  Q-switched  laser  beam  because  the  time 
duration  of  a  signal  is  equal  to  the  inverse  value  of  Q-switch 


Frequency 


0  S  10  15  20  25  30 

Groove  vohene  increase  rate  (ran  Vs)  (  X  104) 


Laser  energy:  l.lQJ/s,  Stage  tiavet  speed:  0.01  ~4.0mm/s, 
Q-sw.  frequency:  2kHz,  Work  material:  Al205+TiC 


Fig.  2.  Relationship  between  groove  volume  increase 
rate  and  sound  pressure  level 


Fig.  1.  Experimental  setup  for  a  detection  and  an  analysis 
of  laser  processing  sound 


241 


frequency.  In  this  report,  the  maximum  value  of  the 
amplitude  of  a  signal  is  defined  as  the  strength  of  processing 
sound. 

The  measurement  of  cross-sectional  area  of  groove 

A  circle,  which  diameter  d(/xm)  is  equal  to  the  focused  spot 
of  laser  beam,  is  imagined  on  the  surface  of  work  material. 
The  number  of  incidence  of  laser  beam  across  the  circle  of 
laser  beam  can  be  calculated  by  the  equation  (d  •  q)/v, 
where  q(kHz)  is  Q-switch  frequency  and  v(mm/s)  is 
processing  speed.  The  maximum  value  of  the  incidence  is 
called  ”  Maximum  Incident  Times  (MIT)”  in  this  report. 
Fig.4  shows  the  relationship  between  cross  section  of 
processed  groove  and  MIT.  As  shown  in  these  figures,  the 
depth  of  groove  changes  according  to  both  the  number  of 
incidence  and  work  materials. 


g  02 

■a  o.is 
«  o.i 

io.os 
0 

o  -0.05 

I  0-1 

£  -0.15 


■HI 

mm 

Ml 

■ 

m 

HI 

HH 

HI 

IH 

Mil 

m 

IHP 

PH 

* 

H 

m 

mi 

m 

imu 

if 

m 

m 

m 

HI 

I 

^m 

m 

0.5 


1.5 


Laser  energy:  1.401/s,  Stage  travel  speed:  1  mm/a, 
Q-sw.  frequency:  3kHz 


The  relationship  between  cross-sectional  area  and  the 
strength  of  sound 

Fig.5  shows  the  strength  of  processing  sound  and  cross- 
sectional  area  versus  MIT  when  an  (Al203+TiC)  and  a 
(WC+Co)  were  used  as  work  materials.  In  case  of  an 
(Al203+TiC),  cross  sectional  area  increases  rapidly  until 
MIT  of  about  3000,  and  after  that  value,  the  inclination  of 
increase  becomes  moderate.  On  the  other  hand,  the 
strength  of  processing  sound  decreases  as  the  increase  of 
MIT.  For  a  (WC+Co),  the  changes  of  the  strength  of 
processing  sound  and  cross  sectional  area  are  similar  to 
the  case  of  an  (Al203+-TiC),  but  both  values  are  smaller 
than  those  of  an  (Al203+-TiC).  These  phenomena 
indicate  that  the  strength  of  processing  sound  and  cross 
sectional  area  versus  MIT  differ  by  changing  work 
materials. 

INVESTIGATION  AND  DISCUSSION 

The  purpose  of  this  research  is  to  clarify  experimentally 
the  relationship  between  the  strength  of  laser  processing 
sound  and  removed  cross-sectional  area  per  a  pulse  of 
laser  beam  when  Q-switch  YAG  laser  is  used  for  a 
grooving.  The  authors  have  revealed  that  there  exists  the 
linear  relationship  between  the  sound  pressure  level  at  the 


Fig.  3.  Distribution  pattern  of  the  strength  of  processing 
sound  versus  processing  time 


800  1200  2400 

(0.3)  (0.2)  (0.1) 

Maximum  incident  times 
(Stage  travel  speed  (mm/s)) 

(a)  AljOj+TiC 


12000 

(0.02) 


240  480  800  1200  2400  12000 

(1)  (0.5)  (0.3)  (0.2)  (0.1)  (0.02) 


Maximum  incident  times 
(Stage  travel  speed  (mm's)) 

(b)  WC+Co 

Laser  energy:  1.40J/s,  Initial  width  of  groove:  80  u  m, 
Stage  travel  speed:  0.01~1.5mm's,  Q-sw.  frequency:  3kHz 


Fig.4.  SEM  images  of  processed  groove 


242 


specified  frequency  and  cross  sectional  area  per  a  single  pulse  laser  beam  in  a  laser  drilling.  Where,  laser  beam  is  controlled 
by  the  outer  electric  circuit.  But  there  were  the  following  problems  in  these  procedures 


®It  was  not  easy  to  detect  the  deepest  position  of  hole 
because  of  the  difficulty  of  grinding.  So,  there  is  a 
possibility  the  occurrence  of  measuring  error  in  the 
calculation  of  the  cross  sectional  area  of  processed  hole. 

©A  lot  of  times  are  necessary  to  repeat  the  experiment 
many  times  in  order  to  assure  the  accuracy  under  the  same 
processing  condition. 

Because  of  these  reasons,  the  relationship  between  cross 
sectional  area  of  processed  groove  and  sound  pressure  level 
was  obliged  to  express  by  a  straight  line  with  distributed 
values.  The  most  difficult  thing  in  the  experiment  is  to 
calculate  the  accurate  cross  sectional  area  of  processed 
groove.  In  this  report,  the  values  of  cross  sectional  area 
was  calculated  based  on  the  SEM  images  of  Fig.  4,  and  the 


(0  0.048  0.024  0.016  0.012  0.0096) 

Maximum  incident  times 
(Stage  travel  speed  (mm/s)) 


Laser  energy:  1.40J/S,  Initial  width  of  groove:  80  u  m, 
Stage  travel  speed:  0.01~1.5nun/s,  Q-sw.  frequency:  3kHz 


relationship  between  sound  pressure  level  and  processed  Fig.  5.  strength  of  processing  sound  and  groove  cross  sectional 
cross  sectional  area  per  one  pulse  irradiation  of  Q-switched  area  versus  maximum  incident  times  of  laser  beam 


YAG  laser  beam  was  indicated.  This  procedure  make 


possible  to  heighten  the  accuracy  of  experimental  data. 


Fig.  5  shows  the  relationship  between  MIT  and  groove 
sectional  area,  and  also  the  relationship  between  MIT  and 
the  strength  of  processing  sound.  The  cross  sectional  area 
of  processed  groove  per  a  laser  pulse  was  calculated  based 
on  Fig.  4. 

A  cross  sectional  area  of  a  processed  groove  per  a  laser 
pulse  VP  at  the  maximum  incident  time  N  can  be  calculated 
by  the  following  equation. 


Vp=(V-V0)/(N-N0)(N>no) 


Groove  cross  sectional  ana  per  pulsc(  u  nr  pulse) 
Stage  (ravel  speed:  0.01  ~  1  Stools,  Wort  material:  AI20,+TiC 


Where,  V  and  V0  are  the  cross  sectional  area  of  processed 

groove  at  the  maximum  incident  time  of  N  and  N0.  And  Fi*6-  Relationship  between  groove  cross  sectional  area 
the  value  of  sound  pressure  level  at  Nth  irradiation  was  PCT  Pdse  811(1  ofProcessmg  *»** 

used  to  indicate  the  correlation  with  a  cross  sectional  area 
per  a  laser  pulse. 

The  effect  of  processing  condition  on  the  relationship  between  cross  sectional  area  and  strength  of  processing  sound 

Fig.  6  shows  the  relationship  between  groove  cross  sectional  area  per  pulse  and  strength  of  processing  sound  when  an 
(Al203+TiC)  is  used  as  a  work  material.  In  this  figure,  laser  energy  was  changed  from  0.2  to  0.62  mJ/pulse.  It  is  cleared  that 
the  relationship  of  both  variables  can  be  expressed  by  a  straight  line  on  a  log-log  chart,  and  the  gradient  of  the  line  is  not 


243 


affected  by  the  applied  laser  energy.  These  phenomena  mean  that  the  calculation  of  the  cross  sectional  area  of  groove  per  a 
laser  pulse  is  possible  only  by  measuring  the  strength  of  laser  processing  sound  when  applied  average  laser  energy  and 
processing  speed  are  changed  under  the  constant  work  material. 

The  effect  of  work  material  on  the  relationship  between  cross  sectional  area  and  strength  of  processing  sound 

Fig.  7  shows  the  results  of  qualitative  analysis  of  work  materials  used  for  the  experiment.  Two  types  of  work  material. 


Fig.  7.  Qualitative  analysis  of  work  material  with  EDS 


ceramics  (a  and  b)  and  sintered  carbides  (c  and  d),  were 
used.  Fig.  8  shows  the  relationship  between  strength  of 
processing  sound  and  groove  cross  sectional  area  per  a 
pulse  laser  beam  for  four  kinds  of  work  material.  Where, 
applied  laser  energy  was  1.4J/s,  Q-switch  frequency  was 
3kHz  and  processing  speed  were  0.01  ~  1.5mm/s. 
Experimental  data  were  separated  into  two  groups,  one 
was  for  ceramics  and  the  other  was  for  sintered  carbides, 
and  the  gradient  of  straight  line  of  ceramic  materials 
showed  the  bigger  value  than  that  of  sintered  carbides. 
This  means  that  strong  sound  generates  when  ceramic 
material  is  selected  as  a  work  material  than  sintered 
carbide  at  the  same  cross  sectional  area  per  a  pulse  laser 
beam.  Fig.  9  shows  the  SEM  images  of  processed 
groove  of  ceramic  (a)  and  sintered  carbide  (b)  at  the  800 
irradiations  of  laser  beam.  The  appearance  of  the  inside 
of  groove  of  both  materials  are  very  differ.  In  case  of 


Groove  cross  sectional  area  per  pulse(  u  rf/pulse) 


Laser  energy:  1.40J/S,  Stage  travel  speed  0.01~  l.Snm/s, 
Q-sw.  frequency:  3kHz 


Fig.  8.  Relationship  between  strength  of  processing  sound 
and  groove  cross  sectional  area  per  pulse 


244 


sintered  carbide,  there  exist  resolidified  material  in  the  groove,  and  the  generation  of  groove  is  not  normal.  In  case  of 
ceramic  material,  on  the  other  hand,  a  part  of  material  is  removed  by  sublimation  of  material.  From  these  SEM  images,  it  is 
cleared  that  more  work  material  is  removed  at  a  ceramic  than  a  sintered  carbide  even  if  the  depth  of  grooves  is  same  for  both 
two  materials.  It  can  be  concluded  by  observing  these  SEM  images  that  the  defference  in  the  manner  of  evapration  of 
matrix  material  effects  on  the  gradient  values  of  two 
straight  lines  of  Fig.  8. 

CONCLUSIONS 

The  conclusions  of  this  research  are  as  follows. 

1)  The  cross  sectional  area  of  groove  increases  rapidly 
until  the  certain  value  of  incident  times  of  laser  beam, 
and  increases  monotonously  after  the  constant  value  as 
the  increase  of  incidence  of  laser  beam.  The  strength 
of  processing  sound  changes  with  inverse  inclination  of 
the  described  phenomenon  as  the  increase  of  incident 
times.  These  phenomena  could  be  observed  for  two 
kinds  of  work  material,  ceramics  and  sintered  carbides. 

2)  The  relationship  between  the  strength  of  processing 
sound  and  the  cross  sectional  area  per  a  laser  pulse  could  be  expressed  by  straight  lines  on  a  log-log  chart  when  applied  laser 
energy  was  changed. 

3)  The  gradient  of  straight  line  was  changed  when  two  kinds  of  work  material  were  used.  This  phenomenon  depends  on  the 
laser  material  removal  characteristic. 

4)  It  was  cleared  experimentally  that  calculation  of  material  removal  volume  with  Q-switch  YAG  laser  was  possible  only  by 
measuring  the  strength  of  processing  sound. 

REFERENCES 

1.  G.V.Arbach,  R.L.Melcher  and  C.E.Scranton,  1983,  Combined  Acoustic  and  Pyroelectric  Laser,  IBM  Tech.  Disclosure 
Bull. 25,  5092 

2.  C.E.Yeack,  R.L.Melcher  and  H.E.Klauser,  Transient  Photoacoustic  Monitoring  of  Pulsed  Laser  Drilling,  App;.Phys.Lett. 

44, 1043 

3.  T.Miyazaki,  T.Uyemura  and  Y.Yamamoto,  1973,  A  Study  on  the  Mechanism  of  Laser  Piercing,  Ann.CIRP,  22, 67 

4.  M.T.Brienza  and  AJ.DeMaria,  1967,  Laser-Induced  Microwave  Sound  by  Surface  Heating ,ApplPhys.Lett.  11, 44 

5.  P.Sheng  and  G.Chryssolouris,  1994,  Investigation  of  Acoustic  sensing  for  Laser  Machining  Processes,  Part  1:  Laser 
Drilling,  J. Mater  Process. Technol.  43,  125 

6.  P.Sheng  and  G.Chryssolouris,  1994,  Investigation  of  Acoustic  sensing  for  Laser  Machining  Processes,  Part  2:  Laser 
Grooving  and  Cutting,  JMaterProcess.Technol.  43,  145 

7.  T.  Ono,  T.  kurita  and  N.  Morita,  1997,  Study  on  the  relationship  between  material  removal  volume  and  sound  pressure 
level  in  laser  processing,  IPMM97,  1045 


(a)  AljOj+TiC  (b)  WC+Co 

Laser  energy:  1.40J/s,  Q-sw.  frequency:  3kHz 
Stage  travel  speed:  0.3mm/s 


Fig.  9.  SEM  images  of  cross  sections  of  processed  groove 


245 


Optimization  of  Thickness  Distribution  of 
Micro-Membrane  by  Genetic  Algorithm 

Hidetoshi  Kotera,  Masatoshi  Senga,  Taku  Hirasawa,  Susumu  Shima 

Department  of  Mechanical  Engineering,  Kyoto  University 
Yoshida  Honmachi,  Sakyo-ku,  Kyoto,  606-8501,  Japan 


ABSTRACT 

A  simulation  method  for  optimizing  dynamic  motion  of  micro-electro-mechanical  devices  and  systems 
(MEMS)  is  proposed.  A  series  of  equations  of  electrostatic  field,  fluid  dynamics  and  deflection  of  micro¬ 
membrane  are  coupled  and  solved  simultaneously.  Since  the  genetic  algorithm  is  appropriate  to  reduce  the 
searching  space  of  solution,  □  D  is  used  to  optimize  the  thickness  distribution  of  a  micro-membrane.  As  an 
application  of  the  developed  method,  the  thickness  distribution  of  the  micro-membrane  of  a  micro  air  pump 
is  optimized.  The  prescribed  performance  of  the  micro-air-pump  can  be  calculated. 


INTRODUCTION 

A  few  types  of  micro-electro-mechnical  system  (MEMS),  e.g.,  micro  optical  mirrors,  micro  valves  and 
micro  actuators,  have  been  developed  and  used  in  practice.  These  devices  are  actuated  by  electrostatic  force 
and/or  fluid  pressure.  To  simulate  the  motion  or  deformation  of  the  micro  devices,  computational 
algorithms  have  been  studied  [1,2,3].  In  the  present  study,  the  air  pressure  field  is  expressed  by  the 
modified  Reynolds  equation  considering  first-order  slip  on  the  surface  of  the  material.  We  couple  the  fluid 
equation  with  the  membrane  deflection  equations  that  incorporate  in-plane  stress.  The  coupled  equations 
are  solved  by  the  finite  element  method,  where  the  electrostatic  force  is  considered  as  an  external  force. 
Since  the  electric  field  changes  according  to  the  membrane  deflection,  the  equation  of  electric  field, 
expressed  by  the  Laplace  equation,  is  solved  by  the  boundary  element  method.  The  fluid,  membrane 
deflection  and  the  electric  field  equations  are  coupled  to  solve  in  the  scheme  of  calculation  based  on  our 
previous  work  [4], 

The  numerical  simulation  is  intended  to  characterize  the  motion  of  the  devices  and  to  design  the  shape  of 
the  mechanism  to  realize  the  prescribed  functions.  There  have  been  studies  using  such  numerical  methods 
as  design  sensitivity  analysis  and  genetic  algorithm  [5]  to  find  the  optimum  shape  or  mechanism  that 
realizes  the  prescribed  functions.  To  produce  a  mechanical  element  of  conventional  scale,  it  is  difficult  to 
distribute  the  thickness  of,  say,  sheet  metal  to  realize  a  prescribed  performance.  In  the  field  of  MEMS,  the 
components  are  produced  by  physical  vapor  deposition  or  by  chemical  deposition,  i.e.,  a  thin  film  is 
deposited  by  CVD  or  sputtering.  By  these  processes,  it  is  easy  to  distribute  the  thickness  of  the  deposited 
thin  films.  It  may  thus  be  possible  to  develop  a  new  mechanical  structure  for  the  desired  performance. 

The  purpose  of  this  work  is  to  develop  a  simulation  method  to  calculate  the  optimum  thickness  distribution 
of  a  micro-membrane  actuated  by  an  electrostatic  force.  Since  a  genetic  algorithm  is  appropriate  to  reduce 
the  solution  search  space,  we  used  it  to  find  an  optimum  shape.  As  an  example,  the  thickness  distribution  of 
the  micro-membrane  of  a  micro-air-pump  actuated  by  an  electrostatic  force  is  optimized  to  realize  the 
prescribed  response.  We  will  show  a  numerical  optimization  method  based  on  a  genetic  algorithm  and 
discuss  the  convergence  of  the  solution  and  efficacy  of  the  developed  method. 


COUPLING  ANALYSIS  SYSTEM 

As  an  example  of  a  micro-actuator  forced  to  be  moved  by  the  electrostatic  force,  Figure  1  shows  a  model  of 
a  micro-air-pump.  The  micro-membrane  is  the  upper  electrode  and  the  bottom  of  the  micro-pump  is  the 
lower  electrode  The  air  is  pumped  out  according  to  the  deflection  of  the  micro-membrane  actuated  by  the 
electrostatic  force. 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


246 


Micro  membrane 


Fig.  1.  Schematic  view  of  micro  pump. 

Pulled  by  the  electrostatic  force,  the  micro  membrane  undergoes  deformation  in  both  in-plane  and  out-of- 
plane  directions.  Therefore,  stress  equilibrium  equations  of  in-plane  stress  and  bending  are  used.  As  the 
micro  membrane  deflects,  the  distance  from  the  lower  electrode  changes.  Thus,  the  change  of  actuation 
force  due  to  the  micro-membrane  deflection  should  be  considered.  We  used  the  following  equations: 

1)  Electric  field 

The  Laplace  equation  for  electrostatic  field  where  a  space  charge  between  the  upper  and  lower  electrode 
does  not  exist  is  written  as 

V2<)>  =0  1 . 

where  (|)  is  an  electrostatic  potential.  The  strength  of  electrostatic  field  E  is  given  by 

E  —  — V  <(>  =  - 2. 
an 

where  n  is  the  outward  normal  to  the  surface  of  the  area  concerned.  The  electrostatic  stress  is  written  as 


where  £  is  dielectric  constant  of  the  fluid. 


2)  Stress  equilibrium 

The  stress  equilibrium  equation  for  in-plane  deformation  is  written  as 


dcx  3x 
3  x  3  y 


3a, 


3  y  3 


3x 

+  ^  +  /„=  0 


where  fx  and  fy  are  body  forces  in  x  and  y  directions,  respectively. 


4. 


The  membrane  is  bent  by  dynamic  pressure  of  fluid,  and  electrostatic  force.  The  stress  equilibrium  equation 
for  bending  is  expressed  by 


D0~^  +  2{Dx}.  +  d\ 

d  x  3  x  3  y 


V  3  S 

2:1  -2  ”'3  / 


^  x  2 

3  x 


w  ^  3  2w  .  3  2w 

-  t.P  „  z  r  -  p-  Pa  +fe+tmp- 


3  y2 


3  t 


5. 


where 


D.  =  - 


Et 


12(1  -v,vv) 


,Da  = 


Et 


^  E  +v  E  )t  3  G  t 

_  \  )'  x _ x  yjm  r-v  _  xym 


D  —  y  *  x  y  m  D  =- 

12(1 -vv  )’  "  24(1  -vv)  ’  - 


= 


EE, 


(l+2v,.)£  +(l+2vJE, 


where  w  is  membrane  deflection,  Dxx,  D„,  Dyy,  Dss  are  flexural  rigidity,  t,„  is  the  thickness  of  the  micro 
membrane,  p  is  the  pressure  of  fluid,  pa  is  atmospheric  pressure,  p  is  density  of  micro  membrane,  fe  is 
electrostatic  stress,  Ex  and  Ey  are  Young’s  moduli  of  micro-membrane  in  x  and  y  direction  respectively,  vx 
and  vy  are  Poisson’s  ratio.  Since  the  micro-membrane  is  composed  of  a  layered  thin  film  deposited  as  a 
physical  and/or  chemical  deposition,  such  as  CVD,  sputtering  and  vapor  deposition,  the  mechanical 
property  is  assumed  to  be  anisotropic  as  in  Equation  5 . 


247 


3)  Fluid  equation 

The  fluid  in  the  pump  is  squeezed  out  by  the  micro-membrane  deflection.  To  simulate  the  fluid  flow  and 
the  pressure  distribution  in  the  micro-pump  accurately,  it  is  necessary  to  consider  the  fluid  flow,  a  pressure 
loss  and  blowout  of  the  fluid  at  the  outlet.  To  do  this,  the  Navier-Stokes  equation  should  be  solved. 
However,  considering  that  the  aspect  ratio  of  the  micro-pump  cavity  is  more  than  ten  and  the  height  is  in 
the  order  of  20  pm,  we  may  estimate  the  performance  of  the  micro-pump  from  the  pressure  distribution  on 
the  deflected  micro-membrane  and  from  the  volume  change  of  the  pump  cavity  without  taking  account  of 
these  factors.  Therefore,  we  use  the  modified  Reynolds  equation  as  the  fluid  equation.  The  outlet  is  a 
hypothetical  one  without  a  through  hole.  The  compressible  fluid-pressure  is  expressed  by  the  modified 
Reynolds  equation  considering  slip  on  the  material  surface  as 


d 

d  x 


'  ,7  d  p 
ph  - — 
a  x 


=  6 


3y 


ph 


3  P 
d  y 


+  61  p  \^~ 

a  a  3  x 


y  d(Ph)  |  y  ^Mlj+12 


,2  d  p 
ph  - — 
d  jc 

d  (ph) 


d  y 


(  w  d  P 

ph  - — 

v  *y 


a\  x  d  x  y  d  y  J  a  d  t 


where  h  is  the  distance  between  the  micro  membrane  and  the  lower  electrode,  la  is  molecular  mean  free 
path  of  the  air,  t  refers  to  time,  Vx  and  Vy  are  velocities  of  the  micro-membrane  surface  in  x  and  y  directions, 
respectively,  and  p  a  is  viscosity  of  air.  h  is  given  by  h  =  h0+w  ,  where  h0  is  the  initial  gap  between  the 
micro-membrane  and  the  lower  electrode.  Vx  and  Vy  are  calculated  by  the  velocity  of  the  deflection. 


Equations  (4),  (5)  and  (6)  must  be  solved  simultaneously  to  analyze  the  deflection  of  the  micro-membrane 
[4].  The  distance  between  the  micro-membrane  and  the  lower  electrode  is  large  enough  so  that  the  effect  of 
molecular  mean  free  path  of  the  air  may  be  negligibly  small. 


In  the  coupled  analysis,  first,  the  electric  field  equation  is  solved  by  the  Boundary  Element  Method. 
Secondly,  derivatives  with  respect  to  time  involved  in  (5)  and  (6)  are  calculated  by  the  implicit  method. 
Finally,  stress  equilibrium  equations  (4)  and  (5)  and  the  modified  Reynolds  equation  (6)  are  solved  until  the 
deflection  w  becomes  unchanged.  These  steps  are  calculated  iteratively.  According  to  the  deflection  of  the 
micro-membrane,  the  boundary  elements  of  the  electrostatic  field  are  modified. 


CODING  AND  DE-CODING  OF  THE  MEMBRANE  FOR  OPTIMIZATION 

We  used  the  genetic  algorithm  to  determine  the  optimum  thickness  distribution  of  the  micro-membrane  so 
that  the  pressure  of  the  fluid  in  the  vicinity  of  the  outlet  is  maximized. 


In  the  genetic  algorithm,  the  thickness  of  the  micro-membrane  must  be  coded  to  0  or  1 .  As  an  example,  the 
coded  genes  are  as  follows, 

1 1 1 00000 1 1 0000 1111000001100001111 00000 11000011110000011000011110000011 0000 1 1 1 1 00000 1 1 000 

oiui 


Each  component  of  the  gene  refers  to  the  thickness  of  of  a  finite  element  of  the  membrane;  "1"  means  t, 
and  "0"  means  t2,  as  shown  in  Fig.2.  The  thickness  of  the  neighboring  four  meshes  is  of  the  same  value. 


Fig.  2.  Coding  method  of  thickness  distribution. 


For  optimization  by  a  genetic  algorithm,  a  fitness  function  for  each  population  must  be  calculated  and 
verified  for  all  population  in  each  generation.  The  fitness  function  should  be  defined  in  a  way  that  is  proper 
to  express  the  characteristics  of  the  phenomena  for  optimization.  As  a  demonstration  of  the  proposed 


248 


method,  we  optimized  the  thickness  distribution  of  micro-membrane  used  for  the  micro-air-pump  as  shown 
in  Figure  1.  If  the  thickness  of  the  micro-membrane  actuated  by  the  electrostatic  force  is  uniform,  the 
deflection  of  the  micro-membrane  is  the  largest  at  the  center.  The  fluid  pressure  at  the  center  also  becomes 
a  maximum.  Although  the  volume  change  of  the  micro-pump  cavity  is  large  enough,  the  pressure  at  the 
outlet  is  low.  Therefore,  the  performance  of  the  micro-pump  composed  of  a  micro-membrane  of  a  uniform 
thickness  may  be  low.  The  maximum  deflection  point  would  change  according  to  the  thickness  distribution 
of  the  micro-membrane.  Since  the  pressure  distribution  depends  on  the  distance  between  the  micro¬ 
membrane  and  the  lower  electrode,  we  utilized  the  distance  at  the  outlet  as  a  fitness  function.  Furthermore, 
in  order  to  increase  the  flow  at  the  outlet,  we  considered  the  volume  change  in  the  pump  cavity  due  to  the 
micro-membrane  deflection.  The  fitness  function  /  is  defined  as: 


/  =  0.7 JE-  +  £-  y,'l  +  0.3  — — ^ 


where  (xh,yh)  is  the  outlet  position,  (xp,yp)  is  the  maximum  deflection  point,  (x<.,yc)  is  the  center  of  the 
membrane,  Vdef  is  the  internal  volume  of  the  micro-pump  cavity  after  deflection,  and  Vinit  is  the  initial 
volume  of  the  pump  cavity. 


We  used  "one  point"  mutation  for  crossover  and  elite  strategies  with  crossover  and  mutation  rates  of  0.9 
and  0.005  respectively.  The  tournament  method  was  used  to  select  populations  of  20  up  to  the  40th 
generation.  In  each  generation,  deflection  was  calculated  by  the  coupled  analysis  for  20  populations.  The 
thickness  distribution  of  each  population  for  the  next  generation  was  based  on  results  of  the  previous  one. 


RESULTS  and  DISCUSSION 

The  size  of  the  micro-pump  is  400pm  long,  200pm  wide  and  20pm  high.  The  material  of  the  micro¬ 
membrane  is  aluminum.  The  material  constants  for  calculation  are  summarized  in  Table  1.  In  the 
calculation,  the  outlet  is  assumed  to  be  located  150pm  apart  from  the  center  of  the  micro-pump  and  100pm 
from  the  side-wall.  As  a  first  demonstration  of  optimization,  2pm  and  3pm  thick  elements  were  distributed. 
Figure  3  shows  the  initial  thickness  distribution  of  the  micro-membrane  for  optimization.  The  initial 
distribution  thickness  is  decided  in  a  random  manner  for  20  populations.  Since  the  thickness  is  distributed 
randomly,  the  deflection  of  the  membrane  is  maximum  at  the  center. 


Fig.3.  Thickness  distribution  of  initial  generation. 


Table  1:  Material  of  micro-membrane  and  geometrical  properties  for  analysis. 


Length  (urn) 

400 

Width  (um) 

200 

Height  (um) 

20 

Thickness  of  the  membrane  (um) 

2  or  3 

Mass  density  of  the  membrane  (kg/m3) 

2330 

Young’s  modulus(GPa)  Ex  ,  Ev 

150 

Poisson's  ratio 

0.3 

Viscosity  (uPa.s) 

17.6 

Molecular  mean  free  path(um) 

0.064 

Atmospheric  pressure(MPa) 

0.101 

Permittivity  (F/m) 

8.854E-12 

The  fitness  value  of  each  population  decreases  with  increasing  generation  as  shown  in  Figure  4.  After  the 
15th  generation,  the  best  fitness  for  20  populations  decreases  to  0.55.  The  average  fitness  also  decreases  to 


249 


about  0.6.  The  best  fitness  does  not  change  after  20  generations.  At  the  10th  generation,  the  number  of 
elements  of  2pm  thick  increases  around  the  outlet.  On  the  other  hand,  the  number  of  elements  of  3pm  thick 
increases  in  the  middle  area  of  the  micro-pump.  The  flow  volume  rate  increases  with  increasing  generation. 


Generation 

Fie.  4.  Fitness  and  flow  volume  in  each  generation. 


Figure  5. a.  shows  the  thickness  distribution  of  the  best  fitness  individual  in  the  40th  generation.  The 
maximum  deflection  point  of  the  micro-membrane  has  approached  90pm  to  the  outlet,  see  Figure  5.b.  At 
the  outlet,  the  pressure  of  the  air  becomes  a  maximum.  The  bending  angle  of  the  micro-membrane  is 
plotted  in  Figure  5.c.  The  area  with  2pm  thick  elements  coincides  with  that  where  the  bending  angle  is 
relatively  large.  It  seems  that  the  bending  performance  of  the  micro-membrane  is  dominant  as  it  deflects.  In 
the  genetic  algorithm,  populations  that  bend  more  easily  in  the  vicinity  of  the  outlet  are  selected  as  being 
better.  This  may  be  due  to  the  limited  geometry'  and  configuration  of  the  membrane  and  the  pump. 


As  a  second  demonstration,  we  optimized  cases  with  elements  of  2pm  and  4pm  thick.  Figure  6(a)  shows 
the  thickness  distribution  of  the  best  fitness  population  in  the  40th  generation.  The  maximum  deflection 
point,  see  Figure  6(b),  shifts  ~40pm  from  the  result  for  elements  of  2pm  and  3pm  thick  but  it  is  still  ~20pm 
from  the  outlet  point. 

In  the  second  example,  the  maximum  deflection  point  moved  toward  the  outlet  point  in  comparison  with 
that  in  the  first  example.  However,  the  pump-out  volume  rate  decreased  about  10%  (see  Figure  4).  It  is  a 
common  knowledge  that  it  is  easy  to  reduce  the  searching  space  by  using  the  genetic  algorithm,  however,  it 
is  difficult  to  discover  the  best  solution.  To  achieve  the  optimum,  we  must  combine  the  genetic  algorithm 
with  other  searching  algorithm,  such  as  hill-climbing  and/or  steepest  descent.  Nevertheless,  we  have  hereby 
shown  that  by  giving  a  thickness  distribution  in  the  membrane  we  are  able  to  obtain  better  or  even, 
optimum  performance. 


CONCLUDING  REMARKS 

We  have  developed  an  analytical  method  to  simulate  the  deflection  of  a  micro-membrane.  We  also  have 
developed  a  method  which  attempts  to  optimize  the  thickness  distribution  in  the  micro-membrane  actuated 
by  an  electric  force  to  perform  a  desired  deflection.  As  an  application  example  of  our  developed  method, 
we  optimized  the  thickness  distribution  of  the  micro-membrane  actuated  by  the  electrostatic  force.  It  was 
shown  that  the  deflection  pattern  strongly  depends  on  the  thickness  distribution.  By  using  a  genetic 
algorithm,  the  thickness  distribution  that  realizes  the  prescribed  performance  is  obtained  after  the  40th 
generation  in  three  hours  calculation  on  a  small  EWS.  However,  the  maximum  deflection  point  is  still 
60pm  separated  from  the  outlet.  The  optimization  method  must  be  studied  further  to  seek  out  improvement. 


250 


(a)  Thickness  distribution  of  micro-membrane 


Maximum  deflection  point 


(c)  Magnitude  of  bending  angle  (radians) 

Fig.  5.  Deflection  of  2  and  3pm  thick  elements 
in  the  40th  generation. 


(a)  Thickness  distribution  of  micro-membrane 


Fig.  6.  Deflection  of  2  and  4pm  thick  elements 
in  the  40th  generation. 


ACKNOWLEDGEMENT 

This  work  was  supported  by  a  Grant-in-Aid  for  Scientific  Research  (C)(No.  10650258)  by  the  Ministry  of 
Education,  Science,  Sports  and  Culture. 


REFERENCES 

1.  H.  Kotera,  H.  Kita,  H.  Yoshida,  Y.  Mizoh,  1992.  A  Scheme  for  Finite  Element  Analysis  of  an  Interface 
Phenomena  of  VCR  Drum,  Head  and  Tape.  IEEE  Trans,  on  Consumer  Electronics,  38(3). 

2.  G.K.  Ananthasuresh,  R.K.  Gupta,  S.D.  Senturia,  1996.  An  Approach  to  Macro-modeling  of  MEMS  for 
Nonlinear  Dynamic  Simulation.  Microelectromechanical  Systems(MEMS)  ASME,  DSC-59. 

3.  F.  Shi,  P.  Ramesh,  S.  Mukherjee,  1996.  Dynamic  Analysis  of  Micro-Electro-Mechanical  Systems. 
International  Journal  for  Numerical  Methods  in  Engineering,  39,  41 19-4139. 

4.  H.  Kotera,  Y.  Sakamoto,  T.  Hirasawa,  S.  Shima,  R.W.  Dutton,  1998.  Dynamic  simulation  of  MEMS 
coupling  electrostatic  field,  fluid  dynamics  &  membrane  deflection.  Micro  System  Tech.,  491-496. 

5.  D.E.  Goldberg,  1989.  GAs  in  Search  Optimization  and  Machine  Learning.  Addison  Wesley. 


251 


Manufacturing  of  Metallic  Prototypes  and  Tools 
by  Laser  Cutting  and  Diffusion  Bonding 

S.  Sandig,  P.  Wiesner 

Faculty  of  Mechanical  Engineering,  Technical  University  of  Ilmenau,  Germany 

ABSTRACT 

This  paper  presents  a  new  rapid  metal  prototyping  technology.  The  direct  production  of  metallic  structures 
is  accomplished  by  laser  cutting  of  metal  sheet  and  subsequently  forming  a  component  by  diffusion-welding 
the  sheets.  The  combination  of  this  two  flexible  manufacturing  techniques  with  a  special  computer-aided 
design  process  offers  excellent  means  of  producing  fully  functional  metallic  parts  as  well  as  tooling  inserts 
from  sheet  metal  in  a  short  time. 


INTRODUCTION 

Today  manufacturing  processes  must  fulfill  several  requirements.  A  process  must  be  flexible,  with  few 
intermediate  stages,  capable  of  automatisation,  integration  and  environmentally  acceptable,  with  no  harmful 
by-products  and  low  energy  consumption.  The  first  of  the  requirements  mentioned  above  are  most 
important  due  to  the  competitive  situation.  The  rate  of  development  is  so  rapid  that  today’s  technologies  are 
soon  out  of  date.  A  company  must  respond  to  the  market  forces  in  the  shortest  time  possible. 

Up  to  50%  of  development  time  is  required  for  the  manufacture  of  prototypes  and  models.  In  some  cases 
the  lifetime  of  the  new  product  is  shorter  than  the  design  time.  This  necessitates  a  drastic  reduction  in  the 
product  development  cycle.  Rapid  prototyping  and  rapid  tooling  techniques  offer  a  way  of  achieving  this. 

Rapid  Prototyping  and  Rapid  Tooling  are  terms  reserved  for  a  new  group  of  computer-based  design  and 
manufacture  processes  able  to  produce  parts  or  tools  directly  from  computer  images.  Stereo-lithography  is 
the  best  known  technique.  Traditional  application  of  rapid  prototyping  techniques  are  for  non-metallic  parts 
(polymers  and  paper).  Often  these  parts  can  only  be  used  for  applications  not  requiring  component  strength 
and  durability.  In  addition  to  tolerances,  shape  and  aesthetic  evaluation,  the  requirement  of  functional 
testing  necessitates  material  properties  close  to  those  encountered  in  actual  use.  There  is  growing  demand 
for  functional  prototype  parts  and  prototype  tools  which  accurately  represent  the  characteristics  of 
production  components.  It  is  clear  that  a  prototype  matching  the  chemical,  mechanical  and  thermal 
properties  of  the  end-product  is  a  much  more  valuable  prototype.  For  that  reason,  development  of  new 
flexible  manufacturing  methods  to  produce  metallic  parts  and  tools  is  one  of  the  objectives  of  recent 
research. 

Generating  a  complex  metal  prototype  part  or  tool  for  industrial  application  is  difficult.  Basically  there  are 
two  main  ways  to  produce  truly-metallic  parts  by  rapid-prototyping.  Indirect  methods  such  as  metal 
investment  casting  using  stereo-lithography-based  moulds,  or  selective-laser  sintering  of  moulds  directly  in 
foundry  sand,  work  well,  but  they  are  very  time-consuming  due  to  secondary  tooling.  Therefore,  present 
research  activities  are  focussed  on  developing  direct  methods.  Many  research  institutes  are  examining 
direct-laser  sintering  of  metal  powders  and  laser-cladding  as  well.  RP-components  produced  with  metal 
powder  techniques  are  porous  and  must  be  furnace-sintered  or  infiltrated  with  a  lower-melting-point  alloy  to 
consolidate  the  objects.  All  of  these  techniques  using  metal  powder  are  at  an  early  stage  of  development  and 
at  present  are  limited  to  small  components.  In  many  cases  they  are  unsuitable  to  produce  tools. 

Therefore  we  prefer  a  different  approach  --  combination  of  laser  cutting  and  diffusion  bonding.  This  offers 
an  excellent  way  to  produce  fully-functional  metallic  parts  and  tool  inserts  from  sheet  metal  in  a  short  time. 

MANUFACTURING  PROCESS 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


252 


The  key  idea  is  based  on  the  concept  of  the  Laminated  Object  Manufacturing  (LOM)  techniques,  originally 
developed  for  manufacturing  of  "wood-like"  paper  models.  In  our  case  the  initial  material  consists  of  metal 
sheets.  The  metal  is  cut  to  the  required  shape  just  as  for  paper  parts.  Cutting  the  metal  sheets  is  possible 
with  C02-laser  or  Nd:YAG-lasers  but  also,  water-jet  or  electrical  discharge  machining  can  be  used.  Laser¬ 
cutting  was  one  of  the  first  uses  of  lasers  in  material  processing  and  apart  from  laser  marking,  it  represents 
the  most  common  field  of  application  in  production  technology  today  [1].  Laser-cutting  has  established 
itself  as  a  highly  flexible  and  precise  process  for  internal  and  external  contours  in  components. 

The  main  problem  arising  with  metal  sheets  is  to  develop  a  process  to  join  them  together.  Several 
possibilities  are  laser-welding  sheet  edges,  bonding  and  of  course  mechanical  joining  such  as  threaded 
joints,  rivets  or  clamps  [2].  The  disadvantage  of  all  these  solutions  is  that  the  mechanical  stability  and 
density  of  these  prototypes  do  not  correspond  to  the  properties  of  the  conventional  metallic  part. 

If  the  metal  sheets  are  to  be  joined  over  the  entire  contact  surface,  diffusion  bonding  is  a  excellent  method. 
This  is  a  special  joining  method  which  enables  matched  joining  of  metallic,  silicate  and  ceramic  material 
completely  in  the  solid  state.  Metals  such  as  copper,  steel-copper,  titanium-nickel,  combinations  of  iron  and 
sapphire,  corundum  and  aluminium,  in  total,  more  than  600  different  material  combinations,  can  be  joined 
by  solid  phase  diffusion  (movement  of  atoms  across  the  interface)  without  addition  of  any  material.  Unlike 
other  pressure  welding  processes  there  is  little  or  no  macroscopic  distortion.  The  bonded  interface  has 
essentially  the  same  physical  and  mechanical  properties  as  the  base  material.  The  surfaces  to  be  joined  are 
pressed  together  and  maintained  in  contact  over  the  entire  surface  so  a  bond  can  be  formed  by  solid-state 
diffusion.  Diffusion  is  carried  out  at  a  temperature,  dependent  on  the  metal,  at  which  the  diffusion  rate  is 
high  and  at  a  pressure  conducive  to  forming  a  good  bond.  Diffusion-welded  bonds  can  be  thermally  and 
mechanically  loaded  and,  furthermore,  they  are  vacuum-tight.  The  strength  of  the  bond  is  of  the  same  order 
of  magnitude  as  that  of  the  basic  material.  Thus,  high  functionality  of  the  prototype  can  be  realised. 

The  specific  conditions  of  diffusion  welding  must  be  taken  into  account  when  designing  and  constructing 
the  prototype,  but  basically,  the  process  is  very  simple.  It  begins  with  component  and  tool  design  on  a  30- 
CAD  system.  The  design  is  done  using  any  CAD  software  suitable  for  complex  geometry.  On  completion  of 
the  design,  an  industry-standard  file  is  output  which  describes  the  surface  of  the  model.  A  software  program 
slices  the  complex  3D  model  into  simple  2-D  cross  sections  and  converts  the  file  into  a  format  compatible 
with  the  postprocessor  (e.g.,  DXF).  The  outer  surface  of  each  layer  maintains  the  3D-geometry  of  the 
original  model.  Then  the  postprocessor  generates  the  corresponding  NC  data  for  the  handling  unit  (i.e.,  the 
laser  cutting  path)  and  provides  specific  information  required  by  the  laser.  A  C02  or  Nd:YAG-laser  is  used 
to  cut  sheets  of  the  desired  material,  usually  steel.  The  thickness  of  the  metal  sheet,  as  well  as  the  required 
alloy  must  be  selected  carefully,  as  these  features  influence  cutting  accuracy  and  final  properties  of  the  part. 
It  is  not  unusual  that  the  optimal  thickness  varies  according  to  the  cross-section  of  the  part.  Beam  quality, 
process  parameters  and  the  positioning  unit,  all  influence  the  quality  of  the  cut  and  affect  the  accuracy  of  the 
prototype.  After  cutting,  the  sheets  are  stapled  and  positioned  in  a  vacuum  chamber.  The  sheets  are  then 
joined  over  their  entire  surface  by  diffusion  bonding  to  form  the  required  part  or  tool.  Fig.  1  summarizes  the 
specific  manufacturing  process. 


BENEFITS 

The  process  enables  fully  functional  parts  or  tools  (moulds  and  dies)  to  be  produced.  In  contrast  to  material 
removal  using  conventional  CNC  machining  operations,  such  as  turning  and  milling,  any  3-D  internal 
contours  can  be  shaped  due  to  the  sandwich  construction.  An  example  is  mould  inserts  with  optimally 
positioned  cooling  pipes  for  heat  diversion.  With  sheet  metal,  it  is  possible  to  combine  thin  and/or  thick 
sheets  of  the  same  or  different  material  into  one  part.  Therefore  it  is  possible  to  optimise  part  properties. 
Components  produced  in  this  manner  can  be  used  directly  in  functional  testing  and  in-field  applications,  so 
shorter  iterative  development  steps  are  possible.  Potential  manufacturing  problems  can  be  eliminated  at  an 
earlier  stage  of  development  when  changes  are  not  so  costly.  The  suggested  technology  is  a  further  example 
of  the  benefits  obtainable  if  conventional  manufacturing  techniques,  such  as  laser  cutting  and  diffusion 


253 


of  the  benefits  obtainable  if  conventional  manufacturing  techniques,  such  as  laser  cutting  and  diffusion 
welding,  are  combined  together  with  computer-aided  design  and  engineering  in  a  continuous  process  chain. 
In  the  presentation  several  examples  will  be  provided  to  demonstrate  the  potentials  of  this  new  technique. 


Product  Idea/Definition  of  Product  Need 


Computer-Aided  Design 


Interface  ■? 

1GES,  VDA, 

STEP,  STL... 

Slicing  into  layers  and  generating 
I  the  corresponding  NC-data 


cross-sectional 


DFX.  HPGL. . . 


pressure 


heater 


Laser  Cutting 


support  pad 
thermocoupl 


Diffusion  Bonding 


r/ 

Functional 

Prototype 


Fig.l.  Schematic  illustration  of  the  various  process  steps 


REFERENCES 

1 .  H.K.  Tonshoff,  1997.  Laser-based  manufacturing  -  competition  or  ideal  complement  to  conventional 
production  technologies.  Proc.  LANE97,  3-30 

2.  J.-P.  Kruth,  J.  Bonse,  I.  Meyvaert  and  B.  Morren,  1997.  Laser-based  Rapid  Prototyping  a  Decade  after 
its  Introduction.  Proc.  LANE97,  93  -  113 


254 


255 


Evolutionary  Systems  and  Machine  Learning 


256 


257 


Artificial  Immune  Systems  in  Industrial  Applications 

Dipankar  Dasgupta*  and  Stephanie  Forrest** 

*  Dept,  of  Mathematical  Sciences,  The  University  of  Memphis, 
Memphis,  TN  38119 

**  Dept,  of  Computer  Science,  The  University  of  New  Mexico 
Albuquerque, NM  87131 


ABSTRACT 

Artificial  Immune  System  (AIS)  is  a  new  intelligent  problem-solving  technique  that  is  being  used  in  some 
industrial  applications.  This  paper  presents  an  immunity-based  algorithm  for  tool  breakage  detection.  The 
method  is  inspired  by  the  negative-selection  mechanism  of  the  immune  system,  which  is  able  to 
discriminate  between  the  self  (body  elements)  and  the  non-self  (foreign  pathogens).  However,  in  our 
industrial  application,  the  self  is  defined  to  be  normal  cutting  operations  and  the  non-self  is  any  deviation 
beyond  allowable  variations  of  the  cutting  force.  The  proposed  algorithm  is  illustrated  with  a  simulation 
study  of  milling  operations.  The  performance  of  the  algorithm  in  detecting  the  occurrence  of  tool  breakage 
is  reported.  The  results  show  that  the  negative-selection  algorithm  detected  tool  breakage  in  all  test  cases. 


INTRODUCTION 

Manufacturers  are  always  looking  for  ways  to  improve  productivity  without  compromising  on  quality  of 
manufacturing  processes.  To  this  end,  much  attention  has  been  directed  towards  automated  manufacturing. 
In  drilling  or  high-speed  milling  industries,  on-line  monitoring  of  tool  breakage  is  a  key  component  in 
unmanned  machining  operations. 

In  most  milling  industries,  a  reliable  and  effective  tool  breakage  detection  technique  is  required  to  respond 
to  unexpected  tool  failure  [1],  In  particular,  such  a  monitoring  technique  is  necessary  to  prevent  possible 
damage  to  the  workpiece  and  the  machine  tool  or  to  avoid  production  of  defective  parts  and  possible 
overloading  of  tools.  The  normal  operation  of  a  milling  cutter  is  often  characterized  from  the  measurements 
of  some  parameters  that  are  correlated  with  tool  wear.  It  is  essential  to  detect  the  occurrence  of  abnormal 
events  as  quickly  as  possible  before  any  significant  performance  degradation  results.  This  can  be  done  by 
continuous  monitoring  of  the  system  for  changes  from  the  normal  behavior  patterns. 

Thus,  a  signal  may  be  sent  to  the  machine  controller/operator  for  triggering  an  emergency  stop  of  the 
machine  and  the  tool  can  be  changed.  Several  techniques  have  been  suggested  for  monitoring  tool  breakage 
in  different  machining  operations  [1],  Recent  efforts  include  time-series  analysis  [2],  Artificial  Intelligence 
(AI)  techniques  [3],  pattern  recognition  methods  [4],  fuzzy  set  theory  [5],  and  neural  networks  [6]  applied 
to  the  problem  of  recognizing  the  cutting  states  and  detecting  tool  breakage.  Among  these,  neural  network- 
based  techniques  have  been  used  to  detect  tool  breakage  in  milling  and  monitoring  manufacturing 
processes  [7], 

Most  methods  require  prior  knowledge  about  various  fault  conditions  [8]  or  tool  breakage  patterns  [7].  It  is 
difficult  to  obtain  a  variety  of  good  tool  breakage  patterns  in  an  industrial  environment.  A  robust  method 
should  detect  any  unacceptable  (unseen)  change  rather  than  looking  for  specific  known  activity  patterns. 
This  paper  proposes  a  new  detection  algorithm  for  tool  condition  monitoring  in  milling  operations.  The 
algorithm  is  based  on  ideas  from  the  immune  system.  It  is  a  probabilistic  method  that  notices  changes  in 
force  pattern  of  tools  without  requiring  prior  knowledge  of  what  changes  to  look  for.  In  this  way,  it 
resembles  the  approach  to  novelty  detection  taken  by  ART  neural  architectures  [9],  Both  neural  networks 
and  our  immune  system-based  algorithm  are  biologically-inspired  techniques  that  have  the  capability  of 
identifying  patterns  of  interest.  However,  they  use  different  mechanisms  for  recognition  and  learning. 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


258 


In  the  next  section,  the  basic  immunity-based  detection  algorithm  is  described.  The  problem,  simulated 
cutting  force  dynamics  in  a  milling  process,  is  then  discussed  following  which  the  proposed  method  is 
demonstrated  for  tool  breakage  detection  by  monitoring  (simulated)  cutting  force  patterns.  This  includes 
the  preprocessing  of  sensory  data  and  the  implementation  details  of  generating  detector  sets  for  monitoring 
tool  conditions.  The  results  of  different  experiments  and  our  evaluation  of  performance  are  given  followed 
by  final  conclusions. 


IMMUNITY-BASED  CHANGE  DETECTION  ALGORITHM 

This  detection  algorithm  is  inspired  by  the  information-processing  properties  of  the  natural  immune  system 
[10].  The  immune  system  uses  learning,  memory,  and  associative  retrieval  to  solve  pattern  recognition 
problems.  Vertebrate  immune  systems  are  capable  of  distinguishing  virtually  any  foreign  cell  or  molecule 
from  the  body's  own  cells  which  are  created  and  circulated  internally.  This  is  known  as  the  self-nonself 
discrimination  problem  [11].  In  the  immune  system,  T  cells  have  receptors  on  their  surface  that  can  detect 
foreign  proteins  (antigens).  During  the  generation  of  T  cells,  receptors  are  made  by  a  pseudo-random 
genetic  rearrangement  process.  Then  they  undergo  a  censoring  process,  called  negative  selection,  in  the 
thymus  where  T  cells  that  react  against  self-proteins  are  destroyed,  so  only  those  that  do  not  bind  to  self¬ 
proteins  are  allowed  to  leave  the  thymus.  This  censoring  process  is  very  important  in  self-nonself 
discrimination.  Our  artificial  immune  system  [10]  is  a  simplification  of  the  complex  chemistry  of 
antibody/antigen  recognition  in  natural  immune  systems.  The  basic  principle  of  our  negative-selection 
algorithm  is  as  follows: 

•  Define  self  as  a  multiset  S  of  strings  of  length  /  over  a  finite  alphabet,  a  collection  that  we  wish  to 
protect  or  monitor.  For  example,  5  may  be  a  segmented  file,  or  a  normal  pattern  of  activity  of  some 
system  or  process. 

•  Generate  a  set  R  of  detectors,  each  of  which  fails  to  match  any  string  in  S.  We  use  a  partial  matching 
rule,  in  which  two  strings  match  if  and  only  if  they  are  identical  at  least  r  contiguous  positions,  where  r 
is  a  suitably  chosen  parameter  [10]. 

•  Monitor  S  for  changes  by  continually  matching  the  detectors  against  S.  If  any  detector  ever  matches,  a 
change  (or  deviation)  must  have  occurred. 

Matching  Rule 

We  adopted  a  partial-matching  rule  based  on  a  prespecified  degree  of  similarity.  To  measure  this  similarity, 
we  are  currently  using  an  r  contiguous  matching  rule  between  two  strings  of  equal  length.  Thus,  for  any 
two  strings  x  and  y,  match(x,  y)  is  true  if  x  and  y  agree  (match)  on  at  least  r  contiguous  locations  (r  less  than 
or  equal  to  l ),  as  illustrated  in  Fig.  1 . 

X  :  bcabcbad 

Y :  dcabdcba 

Fig.  1.  Illustration  of  Matching  Rule: 
x  and  y  are  two  strings  defined  over  the  four-letter  alphabet  a,  b,  c,  d. 

X  and  Y  match  at  3  contiguous  locations  (underlined). 

Thus,  match(x,  y)  is  true  for  r  <  3  and  false  for  r  >  3. 

A  partial-matching  rule  provides  a  detector  with  its  ability  to  detect  sample  strings  in  its  neighborhood 
according  to  the  threshold  value,  r.  This  is  demonstrated  in  Fig.  2,  for  a  binary  string.  The  graphs  in  Fig.  2 
illustrate  that  the  coverage  of  a  string  of  defined  length  increases  exponentially  with  a  decrease  in  r. 
Although  maximum  coverage  can  be  achieved  with  r  =  7,  the  generated  detectors  will  probably  match 
many  self  strings  resulting  in  false  detection.  On  the  other  hand,  a  perfect  matching  (forr  =  / )  implies  that 
the  symbols  are  identical  at  each  location  in  two  strings;  accordingly,  a  very  large  number  of  detectors  are 
needed  to  detect  patterns  in  the  non-self  space.  An  optimal  r  value  estimates  a  reasonably-sized  detector  set 
for  the  success  of  this  method. 


259 


Fig.  2.  The  number  of  points  that  can  be  covered  by  each  binary  string  (of  defined  length 
in  its  string  space)  is  seen  to  vary  with  different  matching  threshold. 

When  a  non-overlapping  set  of  detectors  is  generated  with  a  suitable  matching  threshold,  each  one  can 
serve  as  a  distinct  novelty  pattern  class  in  the  non-self  space.  However,  in  the  case  of  overlapping  detectors, 
multiple  detectors  may  be  activated  for  a  sample  (abnormal)  pattern,  and  more  are  needed  to  provide 
sufficient  coverage  in  the  non-self  space. 

Generating  Detectors 

There  are  many  possible  ways  to  generate  detectors  in  the  non-self  space.  These  approaches  generally  have 
different  computational  complexities  dependent  on  the  choice  of  matching  rule.  In  the  original  description 
of  the  negative-selection  algorithm  [10],  candidate  detectors  are  generated  randomly  and  then  tested 
(censored)  to  see  if  they  match  any  self  string.  If  a  match  is  found,  the  candidate  is  rejected.  This  process  is 
repeated  until  the  desired  number  of  detectors  is  generated.  A  probabilistic  analysis  is  used  to  estimate  the 
number  of  detectors  that  are  required  to  provide  a  given  level  of  reliability.  The  major  limitation  of  the 
random  generation  approach  appears  to  be  computational  difficulty  of  generating  valid  detectors,  which 
grows  exponentially  with  the  size  of  self.  Also  for  many  choices  of  /  and  r,  and  compositions  of  self,  the 
random  generation  of  strings  for  detectors  may  be  prohibitive. 

In  this  paper,  we  generate  detector  sets  using  an  improved  algorithm  proposed  by  [12]  which  runs  in  linear 
time  with  the  size  of  self.  The  algorithm  has  two  phases;  first  it  employs  a  dynamic  programming  technique 
to  count  recurrences  in  order  to  define  an  enumeration  of  all  unmatched  strings  (i.e.,  all  feasible  detectors). 
Second,  a  random  subset  of  this  enumeration  is  chosen  to  generate  a  detector  set.  In  other  words,  given  a 
collection  of  self  strings  S  and  matching  threshold  r,  the  first  phase  of  the  algorithm  determines  the  total 
number  of  unmatched  strings  that  exist  for  the  defined  self  (5).  Then  in  the  second  phase,  some  of  these  are 
selected  to  generate  detectors  for  monitoring  self  (normal  patterns). 


SIMULATION  OF  CUTTING  TOOL  DYNAMICS 

The  dynamics  of  a  machining  process  can  generally  be  monitored  when  it  operates  in  a  defined 
environment  [1],  Usually,  the  methods  for  monitoring  a  milling  process  utilize  measurements  of  cutting 
parameters  correlated  with  tool  breakage  [13],  These  cutting  parameters  include  temperature  [14],  cutting 
force  [13],  vibration  [15],  torque  [16],  acoustic  emission  [17],  etc.  Of  these  parameters,  cutting  forces  are 
widely  used  for  tool  breakage  detection  [13,  18,  19]  for  several  reasons: 

•  Cutting  force  signals  are  much  less  dependent  on  structure. 

•  Cutting  force  signals  can  be  simulated  easily  and  more  accurately  than  acceleration  and  acoustic 
emission  signals. 

•  The  cutting  force  is  a  very  good  indicator  of  the  vibration  between  the  tool  and  workpiece  because  of 
their  higher  sensitivity  and  more  rapid  response  to  changes  in  cutting  state. 


260 


The  cutting  force  variation  characteristics  of  normal  and  broken  tools  are  different.  Under  normal  (stable) 
cutting  conditions,  the  cutting  force  periodically  varies  with  tooth  frequency,  Q,  that  depends  on  the  spindle 
speed: 


(1) 


where  N  is  the  spindle  speed  in  rpm  and  P  is  the  number  of  teeth  on  the  cutter. 

If  the  tool  is  broken,  it  can  not  remove  the  same  amount  of  material  as  the  other  teeth.  Accordingly,  the 
number  of  tooth  periods  deviating  from  the  stable  cutting  pattern  depends  on  the  number  of  teeth  that  are 
actively  involved  in  the  cutting  zone. 

Some  approaches  for  tool  breakage  detection  are  based  on  an  analysis  of  signal  spectra  obtained  from  prior 
FFT  signal  preprocessing  where  signal  magnitude  at  specific  frequencies  increases  when  tool  fractures 
occur.  However,  Moore  and  Reif  [15]  demonstrated  that  tool  breakage  can  be  more  reliably  monitored  in 
the  time-domain  than  in  the  frequency-domain. 

We  prepared  simulated  data  for  cutting  operations  using  the  vibratory  model  described  in  [13,  20],  This 
model  has  been  used  by  many  other  investigators  for  tool  breakage  detection  [2,  18]. 

To  generate  data,  the  spindle  was  represented  by  a  vibratory  system  with  two  degrees  of  freedom  in  the  two 
orthogonal  directions  X  and  Y.  We  considered  a  four-tooth  cutter  with  uniform  pitch,  performing  an  end¬ 
milling,  half  immersion  cut  in  the  A-direction.  In  this  model,  the  instantaneous  cutting  force  at  angle  cp  is 
assumed  to  be  proportional  to  the  chip  thickness,  h.  Forces  acting  on  a  tooth  are  the  tangential  force,  F\  and 
radial  force,  F„  The  instantaneous  tangential  force  F,  can  be  approximated  by  considering  the  tool 
displacement  as: 

F,=  Kcbh  (2) 

subject  to  the  condition  that  if  F,  <  0  then  F,  =  0. 

In  Eq.  2,  Kc  is  the  dynamic  cutting  force  coefficient,  b  is  the  axial  depth  of  cut,  and  h  is  the  chip  thickness. 
The  instantaneous  chip  thickness  is  obtained  from: 

h  =  flsiny-z  +  zniin 

where  f  is  the  feed  rate  per  tooth,  and  z  is  the  displacement  of  the  tool  normal  to  the  machined  surface 
which  is  derived  from  vibratory  displacements  in  the  X  and  Y  directions;  zmi„  is  the  minimum  undulation 
left  behind  in  preceding  cuts  at  the  angle  <p. 

Now  the  displacement,  z  in  the  direction  normal  to  the  cut  surface  is  given  by: 

z  =  xsincp  +  jeosep  (3) 

The  corresponding  instantaneous  radial  component  of  the  cutting  force 

Fr  =  KrF,  (4) 

In  previous  studies  [13, 20],  the  value  of Kr  was  assumed  to  be  0.3. 

For  non-helical  teeth,  the  instantaneous  cutting  force  in  the  X  and  Y  directions  can  be  obtained  by 
decomposing  the  cutting  forces  F,  and  Fr  into  the  X  and  Y  directions: 


261 


Fx  =  Ft  coscp  +  Fr  sirup  =  Ft  (coscp  +  Kr  sincp) 

Fy  =  -Ft  sirup  +  Fr  coscp  =  Ft(~ sirup  +  Kr  coscp)  (5) 

In  the  case  of  multi-tooth  milling,  instantaneous  cutting  forces  in  theX  and  Y  directions  is  expressed  as: 

FX  =  p,) 

1=1 

ry=X6(0f,(<P,)  (6) 

i=l 

and 

0  otherwise. 

In  Eq.  7,  cps,and  <pe  are  the  start  and  exit  cutting  angles,  and  (p,  is  the  cutting  edge  rotation  angle  of  the  ih 
tooth. 

Now,  the  instantaneous  resultant  cutting  force, 

F  =  (FX2+FY2)U2  (8) 

At  every  angle  tp,  the  vibration  amplitude  induced  by  cutting  forces  are  used  where  the  force  components  of 
F  ( Fx  and  Fy)  excite  vibrations  in  X  and  Y  directions  which  can  be  determined  from  the  equations  of  motion 
for  the  system: 


Fx  =  mxx  +  cxx  +  kxx 
~  myy  +  cyy  +  kyy. 


(9) 


In  Eq.  9,  the  structural  parameters  of  two  modes  of  vibration  are  m  is  the  mass,  c  is  the  damping  coefficient, 
and  k  is  the  stiffness.  At  time  step  t,  the  cutter  has  rotated  by  the  angle  cp,  from  the  reference  axis  Y.  The  Fx 
and  Fy  components  of  the  cutting  force  excite  vibrations  in  the  x  and  y  directions.  The  cutting  force  profiles 
were  simulated  using  forth-order  Runge-Kutta  method  for  every  time  step  (5  t  =  0.0001  sec),  where 
displacements  at  step,  t+1  are  calculated  from  the  cutting  force  data  at  step  t.  So  the  computation  loop  is 
repeated  for  every  time  step  5  t,  then  as  a  whole  cycle  per  tooth  period.  The  deflections  are  used  to 
determine  the  uncut  chip  thickness  for  each  tooth  in  cut  (in  the  direction  normal  to  the  cut  surface,  z).  Once 
the  uncut  chip  thickness  is  determined,  its  value  is  used  to  determine  instantaneous  cutting  forces. 


In  our  experiments,  one  tooth  is  engaged  in  the  cut  at  an  angle  cp,  where  the  cutting  angle  varies  from  0  to 
Jt/2  for  every  tooth  engagement.  The  complete  breakage  of  one  tooth  was  simulated  where  the  broken  tooth 
did  not  remove  any  material  or  started  to  remove  less  material  than  the  other  teeth  that  gave  periodic 
amplitude  fluctuations  in  cutting  force. 

For  detailed  analysis  of  the  vibratory  model,  and  calculation  of  the  cutting  force  and  vibration,  the  reader  is 
referred  to  [13, 20].  The  parameters  used  in  our  simulation  are: 


Damping  coeff.,  cx  =  cy  =  471 .9  kg/s;  Mass,  mx  =  my  -  1 0  kg; 

Spring  constant,  kx  =  ky=  8.1  *  1 06  N/m;  Feed  rate/tooth,  f  -  0.2  mm; 
Cutting  coefficient,  Kc  =  6.67  *  106  N/m;  Depth  of  cut,  b  =  0.508  mm; 
Spindle  speed,  Ns  =  600  rpm;  Spindle  diameter,  D  =  40  mm. 


262 


TOOL  BREAKAGE  MONITORING 

We  formulated  the  tool  breakage  detection  problem  in  terms  of  the  problem  of  detecting  temporal  changes 
(or  abnormal  patterns)  in  cutting  force  patterns  resulting  from  the  broken  cutter.  The  patterns  are  encoded 
as  strings  and  are  monitored  for  whether  or  not  the  current  strings  are  different  (matched  using  negative 
selection),  where  a  change  (or  match)  implies  a  shift  in  the  normal  behavior  in  cutting  force  patterns. 

Data  Preprocessing 

We  preprocess  sensory  data  into  a  form  suitable  for  the  detection  algorithm.  Preprocessing  can  be  viewed 
as  constructing  an  alternative  representation  to  try  to  capture  the  regularities  of  the  data  while  preserving 
the  information  content.  Furthermore,  any  change  that  exceeds  allowable  variation  in  the  data  pattern 
should  ideally  be  reflected  in  the  representation  space.  This  can  be  a  problem  when  very  small  changes  in 
real-valued  data  need  to  be  monitored.  To  handle  this,  we  use  an  approach  that  maps  close  real-valued  data 
into  a  discrete  form:  an  analog  value  is  normalized  with  respect  to  a  defined  range  and  discretized  into  bins 
(or  intervals).  Each  datum  is  assigned  the  integer  corresponding  to  the  bin  into  which  it  falls. 

The  integer  is  then  encoded  using  binary  representation.  However,  if  an  observed  value  falls  outside  the 
specified  range,  it  is  mapped  to  all  0's  or  all  l's  depending  on  which  side  of  the  range  it  crossed.  The 
number  of  bits  used  in  the  discretization  thus  determines  the  size  of  the  bins.  If  each  datum  is  encoded  by  n 
bits  (which  may  be  chosen  according  to  the  desired  precision),  then  there  are  2"  -  2  different  bins  between 
the  maximum  (MAX)  and  minimum  (MIN)  ranges  of  data  (see  Fig.  3). 


A  Coding  Scheme 

Fig.  3.  Illustration  of  a  mapping  technique  for  encoding  close  analog  values  into  a  discrete  form.  For  binary 
encoding  with  n  bits/data,  the  number  of  intervals  and  the  size  of  each  interval  (d)  are  shown  here. 

Implementation  Details 

In  our  implementation,  raw  sensory  data  are  sampled  from  a  moving  time  window  and  mapped  to  binary 
form.  Each  window,  therefore,  is  the  concatenation  of  a  fixed  number  (called  Win  size)  of  data  points.  We 
collect  the  bit  strings  from  a  succession  of  windows,  sliding  along  the  time  series  in  discrete  steps 
(Win  shift)  for  the  normal  data  set.  As  long  as  the  time  series  data  pattern  maintains  similar  behavior,  these 
collected  strings  are  sufficient  to  define  normal  behavior  of  the  system.  This  collection  of  strings  for 
windows  is  our  self  set  (S).  We  then  generate  strings  that  do  not  match  any  of  the  strings  in  S  to  be 
members  of  the  detector  set.  The  generation  of  detectors  in  this  detection  algorithm  is  usually  performed 
off-line,  as  in  the  case  of  neural  networks  (supervised)  training  for  fault  detection  [7]  or  developing  rule- 
based  expert-systems  for  detecting  faults/anomalies  etc.  [21].  Overall,  our  approach  can  be  summarized  as 
shown  in  Fig.  4. 

The  steps  in  the  method  are  as  follows: 

1 .  Collect  time  series  (sensor)  data  that  sufficiently  exhibit  the  normal  behavior  of  a  system  (these  may  be 
raw  data  at  each  time  step,  or  average  values  over  a  longer  time  interval). 


263 


2.  Examine  the  data  series  to  determine  the  range  of  variation  {MAX,  MIN  values)  of  data  and  choose  the 
data  encoding  parameter  ( n )  according  to  the  desired  precision. 

3.  Encode  each  value  in  binary  form  using  the  above  coding  scheme. 

4.  Consider  a  suitable  window  size  that  can  capture  the  semantics  in  data  pattern. 

5.  Slide  the  window  along  the  time  series  and  store  the  encoded  string  for  each  window  as  self  for 
processing  by  the  negative-selection  algorithm. 

6.  Generate  a  set  of  detectors  that  do  not  match  any  of  the  self  strings  according  to  the  partial  matching 
rule  with  suitably  chosen  r.  It  is  desirable  that  the  detectors  are  spread  enough  to  cover  the  unmatched 
string  (non-self)  space.  Also  an  estimate  for  the  size  of  the  detector  set  is  needed  to  ensure  a  certain 
level  of  reliability  in  detecting  changes  [10]. 

7.  Once  a  unique  set  of  detectors  is  generated  from  the  normal  database,  it  can  probabilistically  detect  any 
change  (or  abnormality)  in  patterns  of  monitoring  sensory  data. 

8.  When  monitoring  the  system,  we  used  the  same  preprocessing  parameters  as  in  steps  3  and  4  to  encode 
new  data  patterns  (moving  window).  If  a  detector  is  activated  (matched  with  the  current  pattern),  a 
change  in  behavior  is  known  to  have  occurred  and  an  alarm  signal  is  generated  regarding  the 
abnormality.  We  use  the  same  matching  rule  to  monitor  the  system  as  was  used  to  generate  detectors. 


Off-line 


On-line 


i  I 

l  Different  components  of  immune  system  model  * 


Fig.  4.  Schematic  diagram  showing  the  processing  stages  of  immune  system-based  fault  detection. 


264 


The  encoding  parameters  that  affect  preprocessing  are: 

BITS  PER  DATA  (n)  -  this  will  dictate  the  degree  of  numerical  precision  with  which  real  numbers  are 
represented  in  binary  form.  For  example,  5-bit  data  encoding  gives  30  intervals  into  which  the  range  [MIN, 
MAX]  of  data  is  divided. 

W1ND0W_SIZE  (w)  -  the  number  of  samples  encoded  in  a  single  pattern  (each  string  in  self). 

WINDOW_SHIFT  -  the  number  of  samples  by  which  one  pattern  is  shifted  from  the  previous  one  in  a 
moving  window.  For  example,  if  WIN_SHIFT  =  1  with  a  window  size  w,  the  patterns  will  be  {xh  x2, 

{x2,x3,  ....  etc. 

EXPERIMENTAL  RESULTS 

We  simulated  several  test  cases  of  the  milling  cutter  dynamics  to  carry  out  a  set  of  experiments  with  the 
proposed  detection  algorithm.  The  purpose  was  to  detect  tool  breakage  in  different  cutting  environment. 

Fig.  5  shows  typical  cutting  force  patterns  with  and  without  tool  breakage  in  a  simulated  milling  operation. 
In  this  simulation,  the  tool  was  in  normal  cutting  operation  for  1500  time  steps  and  then  one  tooth  was 
broken,  causing  changes  in  cutting  force  signals  at  the  corresponding  tooth  periods.  In  our  experiments,  we 
used  the  first  1000  data  points  as  the  self  set,  S  for  generating  detectors  and  the  rest  of  the  data  series  are 
used  for  testing.  Results  of  the  experiments  are  shown  in  Table  1  and  in  Fig.  6.  Table  1  shows  the  various 
parameters  used  for  preprocessing  data  and  for  generating  detectors.  We  tried  several  different  parameter 
values  and  found  the  reported  values  most  suitable.  We  generated  the  diverse  set  of  detectors  in  such  a  way 
that  they  do  not  match  each  other  by  /--contiguous  bit  rule.  In  these  experiments,  we  set  n  =  6  for  binary 
encoding  of  data  and  two  different  window  sizes  are  considered.  Detection  results  (columns  4  and  5)  show 
that  the  mean  number  of  times  detectors  activated  and  the  average  detection  rate  in  each  case. 


Fig.  5.  Simulated  cutting  force  signals  of  normal  operation  and  with  tool  breakage  in  a  milling 
operation.  Here,  one  tooth  of  the  cutter  is  broken  after  1500  time  steps. 

In  all  test  runs,  the  generated  detectors  could  detect  the  tooth  periods  in  which  the  changes  in  the  force 
pattern  occurred.  Fig.  6  shows  a  typical  run  and  the  number  of  activated  detectors  (novel  patterns 
encountered)  at  different  time  steps.  In  this  example,  a  maximum  of  three  detectors  is  activated  (out  of  20) 
when  there  are  significant  changes.  Note  that  the  detectors  remain  inactive  during  the  normal  operating 
period,  in  particular,  between  1000  and  1500  time  steps  where  the  data  exhibit  a  normal  pattern,  thus 
avoiding  false  positives.  Also  all  broken  tooth  periods  can  easily  be  detected. 

Further  experiments  were  conducted  with  various  cutting  parameters  to  simulate  cutting  forces  for  normal 
and  broken  tool  condition  as  summarized  in  Table  2.  In  each  case,  first  1500  data  were  considered  as  the 
measurement  of  normal  cutting  and  the  rest  were  for  the  broken  tool.  In  all  experiments,  the  encoding 
parameter  n  was  set  to  5.  Two  different  window  sizes  were  considered  with  different  parameter  settings. 
Experiments  were  repeated  (10  times)  for  each  cutting  condition,  where  a  small  set  of  detectors  were 
generated  from  1000  initial  data  and  used  for  monitoring  the  rest.  Results  of  the  experiments  are  shown  in 
Table  3.  These  experiments  indicate  that  our  algorithm  can  easily  detect  tool  breakage  in  all  test  cases. 


265 


T3 

a 


n 

ns 

•o 

■3 

Vt 

J8 


2-1 


20  detectors  generated  ficaa  first  1000  data 


0  250  500  750  1000  1250  1500  1750  2000 

Number  of  time  steps : 


Fig.  6.  The  height  of  vertical  lines  in  the  graph  corresponds  to  the  number 
of  detectors  activated  when  novel  patterns  are  found. 

The  results  agree  with  our  theory  that  performance  is  a  function  of  the  matching  threshold  ( r ).  With  larger 
r,  the  generated  detectors  become  sensitive  to  any  particular  novelty  in  data  patterns,  so  more  detectors  are 
needed  to  achieve  a  desired  level  of  reliability.  On  the  other  hand,  ifr  is  too  small,  it  may  be  impossible  to 
generate  a  detector  set  of  reasonable  size  from  the  available  self,  since  no  unmatched  strings  (non-self)  may 
not  exist.  This  suggests  that  r  can  be  used  to  tune  detection  reliability  against  the  risk  of  false  positives. 

Table  1. 

Tool  breakage  detection  results,  averaged  over  50  runs.  Column  4  shows  the  mean  number  of 
detections.  The  standard  deviations  are  shown  in  parentheses.  The  detection  rate  is  shown  in 
column  5.  This  is  the  ratio  of  the  average  detection  to  the  number  of  actual  novel  data  patterns. 


Encoding 

parameters 

Matching 

Threshold 

(r) 

Number  of 
Detectors 

(R) 

Breakage  detection 

Mean  (Std.  Dev.) 

Detection 
Rate  (%) 

Win_size=5 

10 

50 

14.30(2.32) 

59.58 

Win_shift=5 

9 

40 

17.57(2.25) 

74.32 

/  =30,  S=200 

8 

30 

22.16(2.57) 

91.64 

Win_size=7 

12 

40 

10.36(3.36) 

62.78 

Win_shift=7 

10 

30 

20.38(5.57) 

75.56 

/  =42,  S=142 

9 

20 

30.75(7.91) 

93.28 

Table  2.  Use  of  various  cutting  parameters  for  generating  cutting  force  signals. 


Experiment 

Number 

Axial  Depth 
Cut  (mm) 

Feed  Rate 
(m/min) 

Spindle 

Speed(rpm) 

Spindle 

Diameter(mm) 

1 

1.34 

90.6 

800 

50 

2 

1.016 

125.4 

500 

40 

3 

1.524 

50.8 

700 

40 

266 


Table  3. 

Results  of  tool  breakage  detection  under  different  cutting  conditions.  Column  5  shows  the  number 
of  detectors  (as  %)  activated  when  novel  patterns  were  encountered  in  periods  corresponding  to  a 
broken  tooth.  Column  6  shows  detection  rates.  Note  that  detection  rates  are  high  (95  -  100  %) 
when  monitoring  broken  tooth  periods. 


Encoding 

Parameters 

Experiment 

Number 

Matching 

Threshold 

(r) 

Total  Number 
Of  Detectors 
Generated 

Detectors 

Activated 

(%) 

Broken  Periods 
Detected 
(%) 

Winsize  =  6 

1 

8 

30 

75 

98 

Win  shift  =6 

2 

8 

40 

72 

100 

/=30,  N,=  166 

3 

9 

50 

63 

96 

Win  size  =  8 

1 

9 

40 

78 

99 

Win  shift  =8 

2 

10 

50 

73 

98 

MO,  A, =125 

3 

11 

70 

67 

95 

CONCLUSION 

In  this  paper,  we  have  proposed  a  method  for  tool  breakage  detection  based  on  principles  inspired  by  the 
natural  immune  system.  The  objective  of  this  work  is  to  develop  an  efficient  detection  algorithm  that  can  be 
used  to  alert  an  operator  to  any  changes  in  steady-state  characteristics  of  milling  cutter  dynamics.  The 
results  demonstrated  that  the  proposed  algorithm  could  successfully  detect  the  tooth  breakage  from 
dynamic  variation  of  the  cutting  force  signals.  It  is  to  be  noted  that  our  approach  relies  on  a  large  enough 
samples  of  normal  sensory  data  to  generate  a  diverse  set  of  detectors  that  probabilistically  notice  any 
deviation  from  the  normal  operation.  Because  it  does  not  look  for  any  particular  (or  known)  fault,  rather 
indicate  that  these  patterns  are  novel  with  respect  to  the  normal  behavior  pattern,  this  algorithm  could  be 
incorporated  into  existing  diagnostic  tools  for  further  classification. 

The  detection  system  can  be  updated  quickly  by  generating  a  new  set  of  detectors  as  the  normal  milling 
operation  is  modified  by  tool-workpiece  geometry,  change  in  cutting  conditions,  etc.  Forrest  et  al.  [10] 
show  that  even  a  small  set  of  detectors  have  a  high  probability  of  noticing  changes  in  the  original  data  set. 

In  most  monitoring  systems,  detection  of  spurious  changes  in  sensor  measurements  is  not  as  important  as 
the  gradual  change  in  pattern  over  a  period  of  time,  so  our  probabilistic  detection  algorithm  is  a  promising 
alternative  to  such  problems.  It  may  be  necessary  to  choose  a  fault-detection  threshold  that  allows 
instantaneous  variations  or  spikes  from  the  established  normal  patterns  while  monitoring  real  sensor  data. 

There  are  a  number  of  parameters  that  are  tunable  in  both  preprocessing  and  detector  generation.  During 
preprocessing,  the  desired  precision  can  be  achieved  by  grouping  similar  analog  data  in  the  same  bin,  and  a 
suitable  window  size  can  be  chosen  to  capture  regularities  in  the  data  patterns.  Note  that  the  system  can 
monitor  using  different  time  scales  simultaneously.  Instead  of  directly  encoding  time-series  data,  it  may  be 
necessary  to  transform  data  (e.g.,  by  Fourier  transform)  depending  on  the  data  properties.  It  is  also  possible 
to  combine  several  sensor  signals  (sensor  fusion)  to  improve  system  reliability  [22],  particularly,  when  a 
single  sensor  does  not  correlate  well  with  all  the  anomalies  to  be  detected.  Decisions  based  on  multiple 
sensors  provide  more  information  simultaneously  with  higher  quality  than  decisions  from  a  single  sensor. 

One  simple  data  fusion  method  is  weighted  addition  of  sensor  signals.  A  desired  detection  reliability  can  be 
achieved  by  changing  window  size,  matching  threshold,  or  the  number  of  detectors.  The  probability  of  a 
match  at  r  contiguous  positions  and  the  impact  of  different  r  vslues  on  overall  computational  behavior  of 
the  algorithm  are  reported  in  [10],  Theoretical  analysis  and  empirical  experiments  suggest  the  algorithm  is 
highly  sensitive  to  r.  We  are  currently  investigating  other  matching  rules  and  generation  algorithms  [23], 

We  have  tested  the  feasibility  of  this  detection  algorithm  on  a  number  of  data  sets,  including  the  Mackey 
Glass  series  [9],  and  some  real  sensor  data.  The  experiments  suggest  this  detection  algorithm  can  be  useful 
for  many  other  similar  problems,  including  fault  detection,  anomaly  detection,  machine  monitoring, 


267 


signature  verification,  noise  detection,  patient's  condition  monitoring  and  so  on.  The  idea  of  using  immune 
system  principles  in  fault  detection  was  also  studied  by  others  [24,  25],  however,  they  have  chosen  a 
different  set  of  principles  to  emulate  process  fault  diagnosis.  The  remarkable  detection  ability  of  biological 
immune  systems  suggest  negative-selection  algorithms  are  well  worth  exploring  in  industrial  applications. 

REFERENCES 

1.  Altintas,  Y.  and  Yellowley,  I.,  1989.  In-Process  Detection  of  Tool  Failure  in  Milling  using  Cutting  Force 
Models.  J.  Eng.  for  Ind.,  Ill,  149-157. 

2.  Tansel,  I.N.,  McLaughlin,  C.,  1993.  Detection  of  tool  breakage  in  milling  operations-I.  The  time  series 
analysis  approach.  Int.  J.  Mach.  Tools  &Manu.,  33(4),  531-544. 

3.  Chryssolouris,  G.,  Guillot,  M.,  1990.  A  comparison  of  statistical  and  AI  approaches  to  the  selection  of 
process  parameters  in  intelligent  machining.  Trans.  ASME;  J.  Eng.  for  Ind.,  1 12,  122-130. 

4.  Li,  C.J.,  Wu,  S.M.,  1989.  On-line  Detection  of  Localized  defects  in  bearings  by  pattern  recognition 
analysis.  Transactions  of  the  ASME;  Journal  of  Engineering  for  Industry,  1 12,  331-336. 

5.  Du,  R.X.,  Elbestawi,  M.A.,  Li,  S.,  1992.  Tool  condition  monitoring  in  turning  using  fuzzy  set  theory.  In 
Int.  J.  Mach.  Tools  &  Manu.,  32(6),  781-796. 

6.  Tansel,  I.N.,  McLaughlin,  C.,  1993.  Detection  of  tool  breakage  in  milling  operations-II.  The  neural 
network  approach.  Int.  J.  Mach.  Tools  &  Manu.,  33(4),  545-558. 

7.  Guillot,  M.,  Ouafi,  A.E.,  1991.  On-line  Identification  of  Tool  Breakage  in  Metal  Cutting  Processes  by 
use  of  Neural  Networks.  Intelligent  Engineering  Systems  through  Artificial  Neural  Networks,  ASME 
Press,  New  York ,  I,  701-709. 

8.  Kozma,  R.,  Kitamura,  M.,  Sakuma,  M.,  Yokoyama,  Y.,  1994.  Anomaly  Detection  by  neural  network 
models  and  statistical  time  series  analysis.  Proc.  IEEE  Int.  Conf.  Neural  Networks,  Orlando,  FL. 

9.  Caudell,  T.P.,  Newman,  D.S.,  1993.  An  Adaptive  Resonance  Architecture  to  Define  Normality  and  Detect 
Novelties  in  Time  Series  and  Databases.  IEEE  World  Cong.  Neural  Networks,  Portland,  IV,  166-176. 

10.  Forrest,  S.,  Perelson,  A.S.,  Allen,  L.,  Cherukuri,  R.,  1994.  Self-Nonself  Discrimination  in  a  Computer. 

Proc.  IEEE  Symp.  Research  in  Security  and  Privacy,  Oakland,  CA,  202-212. 

11.  Percus,  J.K.,  Percus,  O.,  Person,  A.S.,  1993.  Predicting  the  size  of  the  antibody  combining  region  from 
consideration  of  efficient  self/non-self  discrimination.  Proc.  Nat.  Acad.  Sci.,  60,  1691-1695. 

12.  Helman,  P.,  Forrest,  S.,  1994.  An  Efficient  Algorithm  for  Generating  Random  Antibody  Strings. 

Technical  Report  Technical  Report  No.  CS94-7,  Dep't.  Comp.  Sci.,  University  of  New  Mexico. 

13.  Elbestawi,  M.A.,  Ismail,  F.,  Du,  R.,  Ullagaddi,  B.C.,  1994.  Modelling  machining  dynamics  including 
damping  in  the  tool-workpiece  interface.  J.  Eng.  for  Ind.,  1 16, 435-439. 

14.  Palmai,  Z.,  1987.  Cutting  Temperature  in  Intermittent  Cutting.  Int.  J.  Mach.  Tools  &Manu.  27(2),  261-274. 

15.  Moore,  T.,  Reif,  Z.,  1992.  Detection  of  Tool  Breakage  using  Vibration  Data.  Proc.  N.A..Manu.  Res. 

Conf.  (13thNAMRC);  SME  Trans.  Manu.  Eng.  45-50. 

16.  Takata,  S.,  Ogawa,  M.,  Bertok,  P.,  Ootsuka,  J.,  Matushima,  K.,  Sata,  T„  1985.  Real-Time  Monitoring 
System  of  Tool  Breakage  using  Kalman  Filtering.  Robotics  &  Computer-Integrated  Manu.,  2(1),  33-40. 

17.  Liang,  S.Y.,  Domfeld,  D.A.,  1989.  Tool  wear  detection  using  time  series  analysis  of  acoustic  emission. 

In  Journal  of  Engineering  for  Industry,  111,  199-205. 

18.  Tamg,  Y.S.  Lee,  B.Y.,  1992.  Use  of  model-based  cutting  simulation  system  for  tool  breakage 
monitoring  in  milling.  In  International  Journal  of  Machine  Tools  &  Manufacturing, 32(5),  641-649. 

19.  Li,  G.S.,  Lau,  W.S.,  Zhang,  Y.Z.,  1992.  In-Process  Drill  wear  and  breakage  monitoring  for  a  machining 
centre  based  on  cutting  force  parameters.  Int.  J.  Mach.  Tools  &Manu.,  32(6),  855-867. 

20.  Tlusty,  J.,  Ismail,  F.,  1983.  Special  aspects  of  chatter  in  milling.  Trans.  ASME;  Journal  of  Vibration, 
Acoustics,  Stress,  and  Reliability  in  Design,  105,  24-31. 

21.  Frank,  P.M.,  1990.  Fault  diagnosis  in  dynamic  systems  using  analytical  and  knowledge-based 
redundancy  -  A  survey  and  some  new  results.  Automatica,  26(3),  459-474. 

22.  Domfeld,  D.A.,  1990.  Neural  network  sensor  fusion  for  tool  condition  monitoring.  Annals  C1RP,  24, 
101-105. 

23.  Dasgupta,  D.,  Forrest,  S.,  1996.  Novelty  detection  in  time  series  data  using  ideas  from  immunology. 

ISCA  5th  Int.  Conf.  Intelligent  Systems,  Reno,  Nevada. 

24.  Ishida,  Y.,  Mizessyn,  F.,  1992.  Learning  Algorithms  on  an  Immune  Network  Model:  Application  to 
Sensor  Diagnosis.  Proc.  Int.  Joint  Conf.  Neural  Networks,  China,  I,  33-38. 

25.  Ishida,  Y.,  1993.  An  Immune  Network  Model  and  its  Applications  to  Process  Diagnosis.  Systems  and 
Computers  in  Japan,  24(6),  38-45. 


268 


269 


Inductive  Learning  for  Optimization  of  Simulation  Model  Output 

Rainer  Barton  * ,  Helena  Szczerbicka  ** 


German  Aerospace  Center  (DLR),  Institute  for  Flight  Mechanics,  Lilienthalplatz  7, 
38108  Braunschweig,  Germany,  Email:  barton@, informatik.uni-bremen.de 

**  Department  of  Computer  Science  and  Mathematics  -  FB3,  University  of  Bremen, 
28334  Bremen,  Germany,  Email:  helena@informatik.uni-bremen.de 


ABSTRACT 

In  this  article  we  present  the  optimization  approach,  'ML-Opt',  which  approximates  the  structure  of  an 
unknown  goal  function  by  analyzing  functional  dependency  between  search  points.  The  functional 
dependency  is  determined  by  an  inductive  learning  algorithm,  which  generates  a  classifier  used  as  a  control 
structure  in  the  optimization  process.  A  numerical  example  and  discussions  are  presented. 


INTRODUCTION 

Parameter  optimization  is  one  of  the  most  important  issues  in  a  simulation  study.  In  terms  of  these  methods 
optimization  is  a  kind  of  blind-directed  stochastic  search  in  a  solution  space.  Direct  optimization  strategies 

are  normally  used  to  solve  an  optimization  problem  f+  :  Rn  — >  Rm  given  by  a  simulation  model  with  n 
input  parameter.  Therefore  the  wi-dimensional  model  output  is  transformed  by  a  problem  dependent  goal 

function  /*  :  Rm  R  .  In  the  following  paper  we  will  concentrate  on  the  optimization  of  the  function 

f:Rn->R  with  /(x)  =  /*(/+(x)) .  Then  the  optimization  task  is  to  find  a  parameter  vector 
x  =  (  jcj  ,  *2  >  •  •  •  >  )  wh'ch  yields  the  global  or  at  least  a  local  extremum  of  the  goal  function  /  . 

To  optimize  simulation  models,  only  direct  methods  not  using  gradient  information  are  applicable.  These 
methods  can  be  divided  into  global  and  local  optimization  strategies.  Today  the  most  common  direct 
methods  for  global  optimization  are  Genetic  Algorithms  [1],  Evolution  Strategies  [1]  and  Simulated 
Annealing  [2].  For  local  optimization  the  most  common  strategy  is  hill  climbing  [3].  Generally, 
optimization  methods  work  iteratively  only  requiring  goal  function  values  / (x)  =  y  to  determine  a 
parameter  vector  of  an  extremum.  However,  the  user  is  often  interested  in  alternate  solutions  close  to  the 
extremum,  e.g.,  to  reduce  costs,  increase  flexible  performance  of  a  single  subtask  or  reconfigure  a  system. 

After  studying  direct  optimization  algorithms,  we  have  noticed  that  common  optimization  strategies  either 
apply  a  functional  dependency  between  the  recently  found  search  points  to  locate  an  extremum  (e.g.  hill 
climbing)  or  they  combine  information  between  single  search  points  to  explore  the  search  space  (e.g. 
Genetic  Algorithms).  We  thought  it  might  be  possible  to  exploit  more  information  than  only  that  given  by 
the  evaluations  of  the  recently  found  search  points.  Indeed,  we  developed  a  global  optimization  method 
ML-Opt  (machine  learning  optimization)  which  learns  the  structure  of  the  goal  function  and  applies  this 
information  to  generate  new  search  points. 

In  our  approach,  an  inductive  learning  algorithm  generates  of  a  set  of  search  points  (population)  a  classifier 
which  divides  the  search  space  into  regions.  Some  of  these  regions  might  contain  a  local  extremum.  For 
this  reason  we  use  the  classifier  to  decide  which  regions  should  be  investigated  particularly  by  generating 
and  evaluating  new  parameter  settings  of  the  goal  function  in  these  regions.  The  extended  population  is  the 
base  for  a  further  iteration  of  ML-Opt,  in  which  the  search  space  is  partitioned  again.  Thus,  the  strategy 
ML-Opt  yields  two  important  answers  of  the  optimization  problem:  First,  it  locates  a  near  optimal  solution 
itself.  Second,  it  determines  regions  in  which  alternative  near-optimal  solutions  may  be  located. 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


270 


In  section  two  of  this  paper,  inductive  learning  and  the  algorithm  C4.5  are  briefly  discussed.  In  the  next 
section  the  approach  ML-Opt  is  presented.  Section  four  reports  some  numerical  results.  The  last  section 
comments  upon  the  approach  and  discusses  some  further  improvements  of  the  algorithm. 


INDUCTIVE  LEARNING 

Machine  learning  is  primarily  devoted  to  represent  knowledge  or  to  drawing  conclusions  out  of  data.  One 
field  in  machine  learning  is  inductive  learning  which  is  based  on  generalization  of  example  knowledge.  Let 

us  briefly  discuss  an  application  of  inductive  learning  for  classification.  A  data  set  X  =  {3c',x2,...,3f^} 

with  x‘  =(x\,x'2,...,x'„) ,  a  set  of  examples  £cl  and  a  set  of  classes  C  =  {c1,c2,...,cs}  are  the  basic 
inputs  of  an  inductive  learning  algorithm.  Additional  for  every  example  a  relation  ~  is  known  a  priori 
with  a  class: 

W  <=E3cj  eC:e‘  ~cj  1. 

Together  with  their  related  classes,  the  examples  are  the  initial  information  base  for  a  learning  process  (see 
Figure  1).  The  learning  process  builds  a  knowledge  base  out  of  the  information  base  by  means  of 
generalization.  This  knowledge  base  is  applied  to  an  interpretation  process  to  classify  elements  of  the  data 
set.  Here  classification  is  the  process  of  attaching  a  particular  class  to  a  data  set  element.  The  task  of 
inductive  learning  is  the  construction  of  the  classification  function  <| > :  JT  — >  C .  It  has  to  map  all  examples 
correctly  regarding  the  relation  ~  and  it  has  to  map  every  element  of  the  data  set  into  a  class. 

For  the  purpose  of  classification  there  are  many  different  inductive  algorithms  such  as  neural  network, 
statistical  and  symbolic  approaches  [4],  Requirements  on  an  inductive  learning  algorithm  concern  learning 
time,  classification  time,  quality  of  generalization,  classification  accuracy  and  usability. 


Fig.  1.  Inductive  Learning. 


data  element  space  SxT 
Tk 


X=(x1,x2) 

(  x  1  <s  1 
True _ -  False 


(  xe<t  )  (  xe<t 

True^'  False  True^'^False 

□  h  a  _ 


IFxl  >  s  AND  x2  <  t  THEN  class  • 


Fig.  2.  C4.5  Classification  Example. 

In  our  approach  we  use  C4.5  [4],  [5]  one  of  the  most  widely  used  inductive  learning  algorithms  due  to  its 
good  performance  and  transparent  description  of  the  relations  between  attribute  values  and  the  classes.  The 


271 


algorithm  partitions  examples  by  the  gain  and  gain  ratio  criterion.  C4.5  generates  a  classification  function 
in  form  of  a  decision  tree,  e.g.,  the  left  side  of  Figure  2  represents  a  set  of  two-dimensional  examples 
belonging  either  to  the  class  •  or  to  the  class  □  .  C4.5  partitions  the  data  space  in  regions  related  to  two 
classes.  The  middle  of  the  Figure  displays  its  classifier  and  an  extracted  rule  to  classify  a  new  element  in 
the  data  space.  The  right  side  shows  the  partitioning  of  the  data  space  generated  by  the  classifier. 


OPTIMIZATION  WITH  ML-Opt 

The  central  idea  of  ML-Opt  is  to  apply  a  machine  learning  algorithm  as  a  control  element  in  the 
optimization  process.  For  a  detailed  description  of  the  algorithm  we  present  its  control  structure  in  pseudo 
code  in  Figure  3.  In  the  following  subsection  the  different  tasks  and  their  internal  parameters  are  described. 

Initialization  Task 

At  the  beginning  a  initial  population  is  generated.  Its  size  depends  on  the  dimension  of  the  input  vector  x 
and  the  modeling  of  the  goal  function/if  earlier  information  about  the  goal  function  is  available.  The 
classification  criterion  X  reflects  the  relevancy  of  search  points  in  an  iteration  of  ML-Opt.  It  separates  the 
population  elements  ( x,f(x ))  in  at  least  two  classes  according  to  a  classification  criterion.  Fora 
maximization  process  we  apply  the  following  classification  criterion: 

[□if  f(x)<  limit 

V(x,f(x))e  points  :x(x,f(x))  =  \  ^  2. 

The  class  □  reflects  relative  low  values  of  the  goal  function  and  the  class  •  reflects  relative  high  values  of 
the  goal  function.  The  threshhold  value  limit  is  defined  as  the  ^-highest  value  of  the  goal  function  in  the 
population.  Alternatively,  the  median  or  arithmetic  average  could  be  applied. 

PROCEDURE  ML-Opt  {  FUNCTION  f  ) 

VAR 

points,  newPoints  :  BAG  OF  POINT; 

X  :  RELATION  OF  POINT; 
rules  :  SET  OF  RULES; 
regions  :  SET  OF  REGIONS; 

BEGIN 

points  :=  initializePopulation  (  f  ); 

REPEAT 

X  :  =  def ineClassCriterium (  points  ); 

rules  :=  machineLearning (  points,  X  ); 
regions  :=  regionGeneration  (  rules  ); 
newPoints  :=  exploreRegion  (  regions  ); 
points  :=  points  tj  newPoints; 

UNTIL  (  terminationCondition (  points,  regions  )  ) 
RETURN (  f indExtremum (  points  )  ); 

END; 


Fig.  3.  ML-Opt  in  Pseudo  Code 


Learning  Task 

In  every  optimization  step  a  new  classification  function  is  generated  by  an  inductive  learning  process  in 
order  to  find  a  new  partitioning  of  the  search  space.  Therefore  examples  of  the  inductive  learning  process 
are  created  by  the  points  of  the  actual  population  and  the  classification  criterion  T  .  Thereby  the  example 
set  E  is  defined  as: 

E  =  {{x,x(x,f{x)))  |  (x,f(x))e  points}  3. 

In  the  approach,  we  use  C4.5  as  inductive  learning  algorithm  to  classify  the  whole  example  set  E.  Its 
parameters  are  set  in  that  way  that  C4.5  does  not  prune  the  calculated  decision  tree  and  a  leaf  of  the 


272 


decision  tree  can  cover  at  least  one  example.  A  set  of  rules  is  generated  in  the  learning  process.  It  is 
derived  from  the  resulting  decision  tree. 

Each  rule  covers  a  particular  region  in  the  search  space  of  the  goal  function.  Due  to  the  classification 
criterion  only  regions  covering  elements  of  the  class  •  are  interesting  for  a  maximization  process  (called 
'high  regions').  In  every  high  region  new  search  points  are  generated.  In  the  next  iteration  of  the 
optimization  process  the  additional  points  support  a  more  accurate  partition  of  the  search  space.  For  the 
generation  of  new  search  points  many  different  algorithms,  e.g.  Latin  Hypercube  sampling  or  equidistant 
distance  sampling,  can  be  used.  We  prefer  Monte  Carlo  sampling. 

Region  Generation 

Any  high  region  corresponding  to  the  'high'  values  of  /  may  cover  a  local  extremum.  However,  the 
probability  that  an  extremum  is  located  outside  a  high  region  is  not  zero.  That  is  why  by  generating  search 
points  only  inside  a  high  region,  the  position  of  an  extremum  lying  outside  would  never  be  discovered.  In 
order  to  avoid  this  case  we  extend  high  regions  in  the  direction  of  the  growing  (descending)  value  of  the 
goal  function  (see  Fig.  4).  Therefore  the  point  with  the  highest  evaluated  value  of  the  goal  function  in  a 
region  is  called  the  critical  point  of  a  region.  If  the  critical  point  is  in  a  £  -neighborhood  of  a  high  region 
boundary  then  the  region  is  extended  in  order  to  contain  a  d  ■  £  -neighborhood  of  the  critical  point 
(deN). 

Termination  Condition 

The  termination  condition  is  built  out  of  two  stop  criteria  dealing  with  the  average  value  (called  'fitness')  of 
the  Nfjtness  -best  values  of  the  goal  function  in  the  actual  population.  ML  -Opt  checks  whether  the  fitness 
belongs  to  the  8  -neighborhood  of  the  extremum  found  in  the  actual  iteration  step.  The  second  termination 
requirement  regarding  whether  ML-Opt  can  improve  the  8  -neighborhood  within  h  iterations. 


Fig.  4.  Extension  of  a  Region  d-2. 


Fig.  5.  Reduction  of  a  Region. 


Region  Reduction 

Classification  with  C4.5  may  result  in  regions  which  cover  additional  parts  of  search  space  dealing  with  to 
regions  of  the  last  iteration  (see  Figure  5).  On  the  one  hand,  these  parts  could  be  an  extension  of  regions  in 
direction  of  a  possible  extremum.  On  the  other  hand,  these  parts  were  not  classified  in  the  previous 
iterations.  Carefully  analyzing  the  behavior  of  C4.5  we  notice  the  additional  search  space  results  from  a 
classification  problem.  That  is  why  we  reduce  every  region  by  calculating  its  intersection  with  the  convex 
hull  of  all  high  regions  of  the  last  iteration.  After  this,  the  region  is  extended  in  the  direction  of  an 
extremum  if  necessary. 


273 


Example  of  Execution 

The  dynamic  behavior  of  ML-Opt  is  visualized  in  Figure  6.  The  start  population  of  ML-Opt  has  ten  search 
points  and  in  the  next  iteration  in  every  high  region  two  additional  new  search  points  are  evaluated.  For 
definition  of  the  two  classes  •  and  □  see  equation  2.  Its  value  limit  is  set  to  the  fourth-best  value  of  the 
goal  function  dependent  on  the  search  points  in  the  actual  population.  In  the  first  iteration  ML-Opt 
investigates  three  regions  and  terminates  in  one  region  in  its  last  iteration.  Notice  that  in  every  iteration 
classification  of  examples  by  the  classifier  changes  due  to  increasing  knowledge  about  behavior  of  the  goal 
function. 


Fig.  6.  Dynamic  Behavior  of  ML-Opt 


NUMERICAL  EXAMPLE 

The  developed  optimization  approach  has  been  already  applied  to  several  simulation  models  as  well  as  to  a 
diversity  set  of  mathematical  test  functions.  Let  us  consider  the  following  numerical  test  function  (see 
Table  1)  which  defines  a  multimodal  function  used  for  maximization  (see  Figure  7  for  a  two-dimensional 
case).  For  all  dimensions  it  comprises  exactly  one  global  maximum  point  x  =(7.98, 7.98, ...,7.98).  The 
number  of  local  maximum  points  raises  exponentially  with  an  increase  of  the  dimension  n.  We  performed 
experiments  for  n-\  to  «=10. 

With  this  test  function  we  compare  several  results  of  ML-Opt  (average  accuracy  of  the  located  optimum, 
average  evaluation  number  and  average  number  of  iterations)  with  other  strategies.  One  thousend 
independent  runs  have  been  performed  for  every  experiment  entry.  For  the  experiments  different  control 
parameter  settings  are  used.  Table  2  lists  values  of  ML-Opt  parameters:  the  size  of  the  start  population 
Nmc  generated  with  Monte  Carlo  sampling,  the  influence  of  the  trashhold  variable  limit  defined  by  the 
value  of  the  goal  function  of  the  four- best  element,  the  number  of  exploring  evaluations  in  a  single  rule 
N EXpiore  and  the  number  of  the  fitness  elements  N pitness  are  given.  The  8  -neighborhood  is  set  to  be 

0.05%  of  the  located  extremum  in  the  actual  iteration. 


Table  1.  Mathematical  Test  Function 


Search  space 

{3fe  Rn  |0  <  xf  <  3-tt  a  is  Af"} 

Goal  function 

/ (*)  =l  II  xi  ‘  sin(x,-)  | 

i=\ 

Table  2.  Settings  of  ML-Opt 


Exp. 

N 

N  MC 

N  Explore 

N Fitness 

1 

1,2,..., 10 

10  +  N-10 

4 

■ 

2 

6 

20 

4,6,. ..,12 

Average  Evaluation 


274 


Fig.  7.  Graphical  Representation. 

Since  we  are  dealing  with  a  heuristic  algorithm  we  validate  its  behavior  by  comparing  with  Genetic 
Algorithms.  For  Genetic  Algorithms  we  used  the  software  tool  REMO  [6]  with  standard  control  parameter 
settings  recommended  in  the  literature.  In  the  following  the  most  important  control  parameter  values  are 
presented:  The  cross-over  rate  is  set  to  0.75%  and  the  mutation  rate  is  set  to  0.075%.  The  length  of  a 
parameter  segment  in  the  fixed-length  binary  strings  representing  parameter  vectors  is  set  to  10  bit.  For  the 
experiment  the  size  of  the  initialization  population  is  set  to  40  elements  and  the  termination  condition  is 
defined  whether  the  Genetic  Algorithm  cannot  improve  the  goal  function  by  0.05%  within  20  generations. 
The  population  size  is  set  to  10  elements. 


Problem  Dimension  Problem  Dimension 


Fig.  8.  Comparing  ML-Opt  with  Genetic  Algorithms. 

The  strategies  correctly  located  the  global  extremum  with  an  accuracy  corresponding  to  -y-  -neighborhood. 

In  the  first  experiment  (see  Fig.  8)  the  probability  of  locating  the  global  extremum  (called  'hit  probability') 
decreases  in  both  optimization  algorithms  for  increasing  problem  dimension.  The  lower  hit  probability  of 
ML-Opt  results  out  of  the  small  number  of  investigations  in  a  region  and  considering  maximal  four  high 
regions  for  further  analysis.  By  changing  the  parameters  of  ML-Opt  we  are  able  to  improve  its  hit 
probability  (see  experiment  two).  From  N=2  to  N=6  the  average  evaluation  number  of  ML-Opt  increases 
constantly  by  a  dimension  dependent  offset:  evalN  -  evalN_ 1  =  20* N  .  The  non  smooth  trajectories  can 
be  explained  by  the  small  experiment  size  of  one  thousand  independent  experiment  runs. 

In  Figure  9  some  results  of  the  second  experiment  are  presented  for  N=6.  The  hit  probability  and  the 
average  evaluations  in  a  single  experiment  increase  if  either  NExphre  or  N Fitness  increase.  An 

interpretation  for  the  influence  of  N Explore  's  a  more  detailed  exploration  of  regions.  The  second  parameter 


275 


influences  the  termination  condition  leading  to  a  more  detailed  search  in  the  latest  regions.  So,  both  results 
in  a  higher  probability  of  detecting  parameter  settings  near  to  the  global  extremum  by  a  higher  sampling 
size  in  interesting  regions. 


Fig.  9.  Variation  of  Internal  Parameter. 


Due  to  the  limited  space  of  this  article,  we  can  only  briefly  refer  here  to  other  methods  to  improve  the  hit 
probability.  By  changing  the  influence  of  the  threshold  variable  limit  to  the  value  of  the  ^-highest  element 
in  a  population  (k>4)  more  high  regions  can  be  defined  by  the  learning  algorithm.  This  results  in  a  more 
detailed  analysis  of  the  search  space. 


CONCLUSIONS 

Our  approach  directly  integrates  methods  of  machine  learning  in  the  optimization  process.  In  this  paper  the 
basic  structure  of  the  optimization  algorithm  ML-Opt  was  presented.  It  applies  the  inductive  learning 
algorithm  C4.5  as  a  control  in  an  optimization  process.  The  optimization  differs  regarding  existing  direct 
methods  in  building  and  using  a  metamodel  of  the  goal  function  generated  by  a  classifier  and  learning  out 
of  history  information.  We  are  aware  that  regarding  to  the  free  lunch  theorem  it  is  not  possible  to  make  any 
general  statement  about  the  quality  of  behavior  of  search  algorithms. 

The  topics  for  our  future  work  will  concern  integration  of  other  inductive  learning  algorithms  in  ML-Opt, 
improving  the  Monte  Carlo  search  in  high  regions  via  Latin  hypercube  search  (or  other  methods  out  of 
experimental  design)  and  the  definition  of  the  classification  criterion  which  learns  the  slope  of  the  goal 
function  as  well.  By  using  more  than  two  classes  a  more  carefull  analysis  of  the  search  space  might  be 
possible.  Other  topics  concern  methods  for  generation  of  additional  new  search  points  in  the  whole  search 
space,  parallelism  of  single  tasks,  analyzing  termination  conditions,  which  depend  on  the  volume  of  the 
regions,  and  criteria  for  the  applicability  of  ML-Opt  prior  to  direct  optimization  algorithms. 


REFERENCES 

1.  Z.  Michakewicz,  1996.  Genetic  algorithms  +  data  structures  =  evolution  programs.  Springer- Verlag 
Berlin  Heidelberg 

2.  E.  Aarts  and  J.  Korst,  1990.  Simulated  annealing  and  boltzmann  machines.  Wiley 

3.  R.  Hooke  and  T.  A.  Jeeves,  1961.  Direct  search  solution  of  numerical  and  statistical  problems. 
Communications  of  the  ACM,  8, 212-221 

4.  T.  M.  Mitchell,  1997.  Machine  Learning.  McGraw  Hill 

5.  J.  R.  Quilan,  1993.  C4.5:  Programs  for  machine  learning.  Morgan  Kaufmann  Publishers,  Inc. 

6.  M.  Syrjakow  and  H.  Szczerbicka,  1994.  Optimization  of  simulation  models  with  REMO.  Proceedings 
of  the  European  Simulation  Multiconference  ESM'94,  Barcelona,  Spain,  June  1-3, 274-281 


276 


277 


A  Genetically-Optimised  Fuzzy  Parser  of  Natural  Language 

Olgierd  Unold 


Institute  of  Engineering  Cybernetics,  Wroclaw  University  of  Technology 
Wyb.  Wyspianskiego  27,  50-370  Wroclaw,  Poland 
Email:  unold@ci.pwr.wroc.pl  Web  sitez;  http://www.ict.pwr.wroc.pl/~unold 


ABSTRACT 

This  paper  presents  a  genetic  approach  to  inductive  learning  of  natural  language  parser.  The  parser  is  based 
on  a  fuzzy  automaton  and  works  in  the  stratificational  knowledge  representation  system. 


INTRODUCTION 

The  goal  of  our  research  is  to  design  a  computer  system  with  the  same  capability  of  language  and 
knowledge  acquistion  as  human  being.  To  this  end,  we  have  to  first  equip  the  natural  language  processing 
system  with  the  "smart"  parser,  which  can  be  taught  while  working.  .  We  have  proposed  a  self-learning 
parser  of  a  dialog  system,  which  can  in  fact  learn  the  grammatical  rules  from  the  gained  sentences. 

When  developing  the  architecture  of  adaptable  analyzer,  we  should  use  a  powerful  global  optimization 
method  for  the  problem  with  a  large  search  space,  which  contains  possible  difficulties  like  noise.  Due  to 
their  population-based  approach,  genetic  methods  are  ideally  suited  for  implementation  in  searching  the 
architecture  of  self-learning  analyzer.  The  problem  of  inducing  parsers  from  actual  sentences  of  the 
language,  also  known  as  grammatical  inference,  is  a  very  hard  machine  learning  problem.  Many 
researchers  have  attacked  this  problem  [1,  4,  6,  8],  To  author's  knowledge,  the  problem  of  infering  fuzzy 
automaton-driven  analyzer  has  not  yet  been  considered. 

This  paper  describes  a  genetic  approach  to  infer  the  architecture  of  natural  language  parser  based  on  a  kind 
of  fuzzy  automaton,  which  works  in  the  stratificational  knowledge  representation  system  (SKRS). 


STRATIFICATIONAL  KNOWLEDGE  REPRESENTATION  SYSTEM 

In  [10-12]  we  have  proposed  a  fuzzy  analyzer  of  natural  language  texts  (so-called  fPDMAS  -  fuzzy 
nondeterministic  pushdown  automaton  with  associative  memory  access)  which  works  in  the  stratificational 
knowledge  representation  system  [9]. 

Stratificational  knowledge  representation  system  is  an  attempt  at  formalizing  the  multilayer  structure  of 
natural  language.  The  SKRS  is  based  on  a  multistratum  semantic  network  whose  nodes  contain  particular 
linguistic  units  (single  words,  the  so-called  structural-semantic  components,  sentences).  The  node-to-node 
links  correspond  to  relations  between  particular  linguistic  units. 

If  a  natural  language  text  representing  the  acquired  knowledge  is  to  be  mapped  onto  the  stratificational 
semantic  network,  the  particular  linguistic  units  must  be  isolated  in  the  analysed  text,  such  as  word  groups 
and  sentences.  If  we  decide  to  limit  our  input  texts  to  isolated  clauses,  we  can  limit  the  sentence 
decomposition  task  to  two  subtasks:  one  of  word  group  decomposition  into  the  particular  words  and  one  of 
sentence  decomposition  into  word  groups.  In  order  to  increase  both  knowledge  integrity  in  the  knowledge 
base  and  the  efficiency  of  the  search  algorithm,  an  assumption  has  been  made  that  it  is  not  phrases,  but  the 
structural-semantic  components  (the  components)  that  are  the  objective  of  the  sentence  decomposition,  the 
components  being  word  groups  constituting  semantic  units  which  can  no  longer  be  decomposed,  each  of 
which  has  its  own  syntax  characteristics.  Sentence  decomposition  into  structural-semantic  components  in 
the  SKRS  is  performed  by  a  fuzzy  automaton-driven  analyzer  fPDAMS.  An  automaton  decomposition 
model  in  the  SKRS  can  be  used  due  to  the  reduction  of  the  process  of  sentence  decomposition  into 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


278 


structural-semantic  components  to  the  process  isolating  appropriate  substrings  of  symbols  in  the  string  of 
symbols  representing  lexical  categories. 


FUZZY  AUTOMATON-DRIVEN  PARSER 

The  construction  of  the  proposed  analyzer  is  based  on  a  fuzzy  automaton  [3],  The  recent  application  of 
finite-state  approach  in  NLP  show  the  usefulness  of  automata  in  this  area  of  A1  [7].  The  basic  advantage  of 
applying  a  fuzzy  automaton  to  process  natural  language  lies  in  the  existence  of  an  established  algorithm  for 
the  next  state  selection  based  on  values  of  the  characteristic  functions,  allowing  the  characteristic  functions 
value  to  change  with  time.  In  that  case  fuzzy  automaton-driven  parser  "gets  accustomed"  to  the  input 
syntax  constructions,  imitating  human  behavior. 

Figure  1  shows  a  graph  of  a  subautomaton  fA(l,)  that  isolates  from  the  string  of  the  lexical  categories  of  the 
words  a  substring  having  a  syntactical  category  of  the  nominal  group.  There  is  another  subautomaton, 
fA(l3),  incorporated  into  the  fA(lj)  subautomaton  structure,  which  accepts  a  substring  having  a  syntactical 
category  of  the  adjectival  group.  The  subautomaton  fA(l3)  is  represented  by  the  subgraph  fG(l 3)  in  the 
transition  graph.  The  label  wpis  means  that  a  transition  along  the  edge  is  possible  when  wp  (standing  for  the 
lexical  categorie  of  the  word  from  the  analysed  sentence)  is  the  input  symbol  that  is  being  read.  During  the 
transition  the  instruction  is  is  performed.  The  label  Wpi^q, lu  means  that  during  the  transition,  the  instruction 
4  is  performed  with  the  parameters  q,  /„.  The  symbol  rk  denotes  a  specific  set  of  type  w  symbols;  an  edge 
marked  with  a  label  that  contains  the  rk  symbol  represents  a  bundle  of  edges  each  of  which  is  marked  with 
one  of  the  symbols  belonging  the  rk  set. 

The  transition  from  the  state  <7,  to  q3  at  the  input  signal  wp  has  some  degree  of  membership  (usually  denoted 
as  p.y  6  [0,1]).  There  is  the  following  role  for  the  characteristic  function  (Ay  which  describes  the  transition 
relation  of  the  fPDAMS  automaton:  for  a  given  state  q,  and  input  wp  of  the  automaton,  the  next  state  q}  of 
the  automaton  is  determined  so  that  to  maximize  the  characteristic  function  value  corresponding  to  the 
transition.  By  the  way,  note  that  the  fuzzy  automaton  is  similar  to  the  stochastic  automaton.  Both  fuzzy  and 
stochastic  automata  are  examples  of  the  acceptors  of  weighted  languages.  However,  in  stochastic  automata 
the  transitions  are  applied  according  to  a  probability  distribution,  there  exists  no  uncertainty  about  the 
generated  string.  In  fuzzy  automata  all  applicable  transitions  are  executed  to  some  degree  and  transtions 
weights,  in  contrary  to  stochastic  automaton,  do  not  need  to  sum  to  1,  if  there  exist  several  alternative 
transitions  at  any  state. 


Fig.  1.  The  transition  subgraph  fG(l,)  of  the  fuzzy  subautomaton  fA(l,) 

The  transition  subgraph  fG(l,)  of  the  fuzzy  subautomaton  fA(l,)  is  described  by  a  symbolic  expression 
fG~(li)  as  follows: 


279 


fG  (1 ,)  =  (q2i(0.2/w3i|q2i,  0.7/w7iiq22(0.8/w2i,q23(0.5/W|iiq23,  0.2/w6i iq27(  1  /r0i7q28),  0.7/r2i7q28)), 

6.3/r9i,q23,  0.4/r3i5q24l3q6i),  q24(l/roi2q25(0.9/w1ilq23,  0.4/r3i5q61l3q61)),  q26( l/fr^s))  1  • 

The  graph  transition  represents  an  analysis  of  a  fragment  of  the  input  string  of  the  lexical  categories,  the 
analysis  result  being  stored  on  one  of  the  six  stacks  of  the  fPDAMS  automaton.  The  automaton  operation  is 
determined  by  a  finite  number  of  instructions.  These  instructions  constitute  the  transition  rules  to  allow  the 
automaton  to  transit  from  one  configuration  to  another.  Limited  number  of  instructions,  straight  rules  of 
activity  permit  to  use  fPDAMS  as  a  kind  of  linguistic  tool.  The  formal  definition  of  the  fPDMAS  was 
presented  in  [12]. 


A  GENETICALLY  OPTIMISED  fPDMAS 

Genetic  algorithms  (GA)  operates  on  a  space  of  chromosomes,  which  are  the  representatives  of  the 
corresponding  elements  in  the  search  space.  The  algorithm  of  evolutionary  scheme  is  simply  and  can  be 
presented  by  the  selfexplanatory  pseudocode  as  follows  (P(t)  stands  for  a  family  of  elements  forming  the 
GA  search  space  in  evolution  t)  [5]: 

begin 

iteration:  =0 

initiate  population  P(0) 

evaluate  population  P(0) 

while  (not  termination  criterion)  do 

begin 

selection  P(iteration)  from  P(iteration-l) 
alter  P(iteration) 
evaluate  P(iteration) 
end 

end 

In  one  of  the  first  approaches  to  an  evolutionary  scheme,  that  is  evolutionary  programming  [2],  finite  state 
machines  was  used.  Evolutionary  programming  can  operate  on  fPDMAS's  as  follows: 

1)  Initially  a  population  of  parent  fPDMAS's  is  randomly  or  by  hand  constructed.  The  fPDMAS  is 
represented  by  its  transition  graph  (coded  by  the  symbolic  expression). 

2)  The  parents  are  tested  in  the  environment,  that  is  for  each  parent  fPDMAS  the  collection  of  positive  and 
negative  examples  of  sentences  is  offered.  The  fitness  of  the  automaton  can  be  measured  on  the  basis  of  the 
fraction  of  correctly  analyzed  sentences. 

3)  Offsprings  fPDAMS's  are  created  by  mutating  each  parent  automaton.  Mutation  can  change  both  the 
topology  and  the  connection  weights.  The  topology-modifying  operators  can  change  the  topology  of  the 
fPDMAS  by:  adding  new  state  to  the  transition  graph  or  removing  the  existing  state.  The  operators 
changing  the  connection  weigths  operate  on  the  terms  wpisq,lu  modifing  the  individual  elements. 

4)  The  offsprings  are  evaluated  over  the  existing  environment  in  the  same  manner  as  their  parents. 

5)  Those  fPDMAS's  that  provide  the  best  fitness  are  retained  to  become  parents  of  the  next  generation. 

6)  Steps  3)-5)  are  repeated  until  the  end  condition  is  reached. 

When  modeling  evolutionary  processes  using  evolutionary  programming,  it  is  necessary  to  use  a  fitness 
function  to  evaluate  the  performance  or  fitness  of  an  inidividual  chromosome  (i.e.  the  fPDMAS).  Those 
chromosomes  that  are  most  fit  are  most  likely  to  survive,  with  less  fit  chromosomes  dying  off,  being 
replaced  by  the  fitter  chromosomes.  Several  possible  functions  may  be  used  in  determining  the  fitness  and 
efficacy  of  the  fPDMAS. 


280 


Following  [6]  the  evaluation  function  can  in  fact  count  only  the  number  of  sentences  incorrectly  classified 
and  the  number  of  words  incorrectly  accepted  by  the  analyzer.  According  to  [4]  the  fitness  of  the  fPDAMS 
can  be  measured  on  the  basis  of  the  fraction  of  correctly  analyzed  sentences  or  can  be  extended  to  credit 
also  correctly  analyzed  substrings  of  each  positive  training  sentence. 


CONCLUSIONS 

We  have  proposed  the  concept  of  using  the  genetic  approach  to  inference  of  natural  language  analyzer 
fPDAMS.  The  theoretical  bases  for  the  use  of  evolutionary  programming  that  supports  automated  designed 
of  the  architecture  of  the  fPDAMS  were  provided.  Fuzzification  of  the  parser  and  its  evolutionary 
developing  are  the  first  steps  toward  the  self-learning,  "smart"  analyzer  of  the  natural  language  texts.  The 
presented  adaptable  sentence  analyzer,  which  can  be  taught  while  working,  can  be  part  of  various  NLP 
systems. 


REFERENCES 

1.  Brave  S.,  1996.  Evolving  Deterministic  Finite  Automata  using  Cellular  Encoding,  [in:]  Koza  J.R., 
Goldberg  D.E.  Fogel  D.B.,  Riolo  R.L.  (eds.)  Proc.  of  the  First  Annual  Conference  Genetic  Programming 
1996,  Stanford  University,  CA,  USA,  MIT  Press.,  28-31. 

2.  Fogel  L.J.,  Owens  A.J.,  Walsh  M.J.,  1996.  Artificial  Intelligence  through  Simulated  Evolution,  J. Wiley, 
Chichester. 

3.  Kandel  A.,  Lee  S.C.,  1979.  Fuzzy  Switching  and  Automata:  Theory  and  Applications,  Crane  Russak, 
New  York. 

4. Lankhorst  M.M.,  1994.  Grammatical  Inference  with  a  Genetic  Algorithm,  [in:]  Dekke  L.,  Smit  W., 
Zuidervaart  J.C.  (eds.)  Proc.  1994  EUROSIM  Conf.  on  Massively  Parallel  Procesing  Applications  and 
Development,  Elsevier,  Amsterdam,  423-430. 

5.  Michalewicz  Z.,  1992.  Genetic  Algorithms+Data  Structures=Evolution  Programs,  Springer  Verlag, 
Berlin. 

6.  Poli  R.,  1996.  Evolution  of  Recursive  Transition  Networks  for  Natural  Language  Recognition  with 
Parallel  Distributed  Genetic  Programming,  Technical  Report  CSRP-96-19,  School  of  Computer  Science, 
The  University  of  Birmingham. 

7.  Roche  E.,  Schabes  Y.,  1997.  Finite-State  Language  Processing,  A  Bradford  Book,  The  MIT  Press, 
Cambridge,  Massachusetts. 

8.  Schwehm  M.,  A  Massively  Parallel  Genetic  Algorithm  on  the  MasPar  MP-1,  [in:]  Albrecht  R.F.,  Reeves 
C.R.,  Steele  N.C.  (eds.)  Proc.  Int.  Conf.  Artificial  Neural  Nets  and  Genetic  Algorithms,  Wien,  Springer, 
502-507. 

9.  Unold  O.,  1996.  A  Strati ficational  Knowledge  Representation  System,  [in:]  Vetulani  Z.,  Abramowicz  W. 
(ed.)  Language  and  Technology,  Academic  Printing  House  PLJ,  Warszwa,  177-181  (in  Polish). 

10.  Unold  O.,  1997.  Automatic  Analysis  of  Natural  Language  Texts  in  Man-Machine  Communication  [in:] 
Wojtkowski  G.  at  al  (ed.),  Systems  Development  Methods  for  the  Next  Century,  Plenum  Publishing 
Corp.,  New  York,  185-193. 

1 1.  Unold  O.,  1998.  A  Fuzzy  Automaton  Approach  to  Dialog  Systems,  Proc.  of  the  IASTED  International 
Conference-ASC'98,  Cancun,  Mexico,  May,  215-218. 

12.  Unold  O.,  1998.  Application  of  Fuzzy  Sets  in  Natural  Language  Processing,  Proc.  of  the  6th  Congress 
on  Intelligent  Techniques  and  Soft  Computing  EUFIT'98,  Aachen,  Germany,  September,  1262-1266. 


281 


A  Genetic  Algorithm  Based  Approach  to  Solve 
Process  Plan  Selection  Problems 


M.K.Tiwari*,  S.K.Tiwari*,  D.  Roy*,  N.K.Vidyarthi**  and  S.Kameshwaran*** 

*  Department  of  Manufacturing  Engineering, 

National  Institute  of  Foundry  and  Forge  Technology, 

Hatia,  Ranchi-834  003, India. 

**  Department  of  Mechanical  Engineering 
NERIST,  Niijuli,  Itanagar-791  109, India. 
***SriVenkateshNagar,  Chennai-600092,  India. 


ABSTRACT 

Selection  of  a  process  plan  is  a  crucial  decision  making  problem  in  manufacturing  systems  due  to  the 
presence  of  alternative  plans  arising  from  the  availability  of  several  machines,  tools,  fixtures  etc.  Because 
of  its  impact  on  the  performance  of  a  manufacturing  system,  several  researchers  have  addressed  the  plan 
selection  problem  in  recent  years.  Selecting  an  optimal  set  of  plans  for  a  given  set  of  parts  becomes  a  NP 
complete  problem  under  multiobjective  and  fairly  restrictive  conditions.  In  this  paper,  a  Genetic  Algorithm 
(GA)  is  used  to  obtain  a  set  of  feasible  plans,  for  given  part  types  and  production  volume,  to  minimize  the 
processing  time,  setup  time  and  materials-handling  time  constrained  by  not  overloading  the  machines. 
Obtaining  near  optimal  solutions  by  using  different  weights  for  different  objectives  in  GA,  is  also  studied. 


INTRODUCTION 

Process  planning  is  the  systematic  organization  of  detailed  methods  by  which  parts  are  manufactured  from 
raw  material  to  finished  product.  With  the  possibility  of  alternate  machines,  setups,  and  processes  to 
manufacture  a  particular  part,  plan  selection  in  a  manufacturing  environment  has  become  a  crucial  problem. 
Moreover,  modem  manufacturing  systems  may  require  a  part  to  be  produced  simultaneously  with  any 
combination  of  part  types  and  volume,  and  to  be  re-routed  adaptively  to  alternative  machines  in  case  of 
breakdown  or  overloading  of  the  pre-assigned  machine.  Because  of  it  has  a  vital  impact  on  manufacturing 
system  performance,  several  researchers  have  examined  the  plan  selection  problem  in  recent  years.  Kusiak 
and  Finke  [1]  developed  a  model  to  select  a  set  of  process  plans  with  minimum  cost  of  removing  material 
and  minimum  number  of  machine  tools  and  other  devices.  Bhaskaran  [2]  provided  a  model  to  account  for 
factors  such  as,  flow  rate,  processing  time  and  processing  steps.  Zhang  and  Huang  [3]  extended  the 
Bhaskaran  [2]  model  using  a  fuzzy  approach  due  to  imprecise,  conflicting  objectives  in  plan  selection.  Seo 
and  Egbelu  [4]  used  Tabu  search  to  select  a  plan  based  on  product  mix  and  production  volume.  Tiwari  and 
Vidyarthi  [5]  addressed  plan  selection  by  accounting  for  similarity  measures  among  the  plans  of  the  parts. 

In  this  paper,  we  use  Genetic  Algorithm  (GA)  to  obtain  a  set  of  process  plans  for  a  given  set  of  parts  and 
production  volume.  GAs  are  search  and  optimization  algorithms  based  on  the  mechanics  of  natural  genetics 
and  natural  selection.  It  has  been  recognised  as  a  powerful  method  to  obtain  near  optimal  solutions  for 
combinatorial  optimization  problem.  (Davis  [6],  Goldberg  [7],  Deb  [8]).  We  have  used  GA  to  obtain  a  set 
of  process  plans  for  a  given  variety  of  parts  and  production  volume  with  the  objective  of  minimizing  the 
total  processing  time,  setup  time  and  material  handling  time  constrained  by  not  overloading  the  machines. 


PROBLEM  DESCRIPTION 

The  analytical  model  for  the  process  plan  selection  in  our  approach,  consider  the  following  parameters  - 
processing  time,  setup  time  and  material  handling  time.  The  objective  is  to  minimize  the  above  parameters. 
But  these  objectives  may  overload  a  few  workstations  shared  by  most  process  plan  and  cause  a  system 
bottleneck.  Hence  a  constraint  function  which  caters  the  needs  of  not  overloading  any  workstations  is  used 
to  obtain  a  practically  feasible  optimal  solution.  The  notations  used  in  this  problem  are: 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


282 


a  1, 2, ...  N  parts 
ba  batch  size  of  part  a 

Paj  set  of  process  plans  for  part  type  a,  P  =  {Pal,  P^  , ..  Pan }  where  Paj  is  the  jth 
process  plan  for  part  a,  n  =  |  Pa  | 

K  1,  2,..  K  work  stations 

tajk  processing  time  on  work  station  k  for  the  process  plan  Paj 
mtaj  material  handling  time  for  part  a  associated  with  process  plan  jth  process  plan 
of  part  a 

saj  total  setup  time  for  Paj  for  a  batch  size  of  ba 
wsk  maximum  allowable  time  on  workstation  k 
o  objectives  of  minimizing  processing  time  (1 ),  setup  time  (2)  and  material 
handling  time  (3). 

w0  Weight  of  objective  o.Iw„=l 

Xai  1 -if  Paj  is  selected  for  part  a,  O-otherwise 


The  weighted  cost  of  a  process  plan  for  Paj  is  defined  as 

Taj  =  Wj  Xk  tajk  ba  +  w2  saj  +  w3  mtaj  ba 

The  objective  is  to  minimize  the  total  weighted  process  plan  cost  (T)  for  a  set  of  process  plans,  given  by, 

T  =  XaTajXaj  1. 

subject  to  the  constraints 

'  [  Sa  tajk  ba  xaj]  <=  WSfc,  Vk=  1,2,  ..K  2. 

and 

XjXaj=  1  Va=  1,2,..N  3. 

Constraint  (2)  prevents  the  occurrence  of  an  overloaded  machines  and  constraint  (3)  ensures  that  only  one 
process  plan  per  part  is  selected. 


GA  BASED  SOLUTION  METHODOLOGY 

For  a  given  set  of  N  parts  with  na  process  plans  for  part  a,  the  number  of  feasible  solutions  is  given  by 
IINna.  Thus  for  6  parts  with  5  process  plans  each,  1 5625  solutions  need  to  be  exhaustively  searched  to  find 
the  optimal  solution  and  the  search  space  increases  combinatorially  for  further  increase  in  the  number  of 
parts.  Genetic  algorithms  are  found  to  be  potential  search  and  optimization  algorithms  (Deb,  [8])  for 
complex  engineering  optimization  problems.  They  mimic  the  principles  of  natural  selection  (reproduction) 
and  natural  genetics  (crossover  and  mutation)  to  constitute  search  and  optimization  procedures.  Sets  of 
initial  feasible  solutions  are  selected  randomly  and  fitness  values  proportionate  to  their  objective  function 
are  evaluated.  The  solutions  with  higher  fitness  values  are  selected  with  higher  probability  for  next 
generation  to  perform  crossover  and  mutation,  thus  conserving  the  Darwin’s  theory  of  fittest  of  survival. 
Since,  GAs  start  with  a  set  of  initial  points  in  the  search  space,  the  near  optimal  solutions  are  obtained  in 
the  increasing  generations.  The  objective  function  to  be  minimized  in  our  algorithm  is: 


OB=  X,  Taj  xaj  +(mc*M) 


where  me  =  number  of  workstations  that  are  overloaded  and  M  is  a  high  scalar  value  that  penalizes  the  OB 
by  increasing  it  to  higher  values. 

The  fitness  function  evaluated  for  a  solution  in  GA  is  given  by: 

F  =  S/(l+OB) 


where  S  is  a  suitable  scalar  such  that  0<F<100. 


283 


EXPERIMENT  AND  RESULTS 

For  solving  the  process  plan  selection  problem  by  our  proposed  GA  approach,  we  considered  the  problem 
given  in  Table  1 .  A  part  mix  of  8  different  parts  was  considered  each  with  different  batch  sizes  and  a  set  of 
process  plans.  The  shop  floor  has  4  workstations  and  each  was  allowed  a  maximum  machining  time  of 
1000  units.  The  objective  is  to  select  an  optimal  set  of  process  plans  with  the  minimum  objective  function 
OB,  from  the  solution  space  of  139,968,000.  Weights,  w0  ={0.6, 0.2, 0.2},  were  used  for  the  objectives.  GA 
was  then  applied  to  the  problem  with  a  population  size  of  10  for  30  generations.  Single  point  crossover  was 
used  and  mutation  was  performed  over  each  crossovered  string  with  the  given  mutation  probability.  The 
convergence  of  the  GA  to  the  optimum  solution  for  different  sets  of  GA  operators  is  shown  in  Figure  1 . 
GA1  with  crossover  (0.6)  and  mutation  probability  (0.1)  converges  faster  to  the  optimal  than  GA2  with 
crossover  (0.1)  and  mutation  probability  (0.8).  The  set  of  process  plans  selected  by  GA1  and  GA2  are 
{3, 3, 2, 4, 2, 3, 2, 3}  and  {2, 3, 2, 4, 2, 2, 4, 3}  respectively.  The  convergence  of  GA1  to  the  optimal  is  due  to  the 
fact  that  higher  crossover  rates  result  in  better  solutions.  When  all  objectives  were  given  equal  weights, 
wk={0.33, 0.33, 0.33},  the  optimal  set  of  process  plans  selected  was  {2, 3, 2, 4, 2, 3, 4, 3}. 


Table  1.  Data  for  the  process  plan  selection  problem. 


Part  No 

Batch 

Size 

Process 

Plan 

WS1 

WS2 

WS3 

WS4 

Setup 

Time 

Material 

Fiandling 

Time 

1 

15 

1 

0 

24 

18 

0 

60 

5 

2 

10 

10 

8 

9 

90 

15 

3 

15 

0 

10 

14 

75 

9 

2 

12 

1 

8 

10 

9 

10 

100 

16 

2 

0 

12 

10 

17 

70 

12 

3 

15 

0 

5 

10 

80 

10 

3 

18 

1 

13 

16 

0 

16 

85 

18 

2 

0 

15 

11 

12 

65 

14 

4 

8 

1 

10 

10 

10 

2 

12 

12 

10 

3 

0 

12 

12 

4 

12 

0 

8 

12 

8 

0 

89 

15 

0 

16 

0 

80 

10 

0 

9 

10 

90 

11 

8 


5 


12 

0 

0 


284 


Generations 


Best  Fit  ( GA1) 
Avg  Fit  (GA1) 
Best  Fit  (GA2) 
Avg  Fit  (GA2) 


Fig.  1.  Performance  of  the  proposed  GA  Algorithm. 

(  GA1  -  Cr.  Prob  =  0.6,  Mutt.  Prob  =  0.1 ,  GA2  -  Cr.Prob  =  0.1,  Mut.  Prob  =  0.8) 


CONCLUSION 

Process  Plan  selection  is  a  crucial  problem  in  an  automated  manufacturing  environment.  Further,  selecting 
a  set  of  optimal  process  plans  for  a  variety  of  parts  with  different  production  volume  is  a  combinatorial 


optimization  problem,  which  makes  the  exhaustive  search  technique  a  practically  infeasible  solution. 
Various  researchers  have  attempted  the  problem  using  heuristics  and  other  optimization  techniques  like 
Tabu  Search  and  Simulated  Annealing.  In  our  approach,  we  used  GA  to  find  a  optimal  set  of  process  plans 
that  minimizes  the  total  processing  time,  setup  time  and  material  handling  time,  meanwhile  avoiding 
system  bottleneck  by  not  overloading  the  workstations  .GA,  which  starts  with  different  initial  points  in  the 
search  space,  ultimately  finds  a  near  optimal  solution  for  the  complex  combinatorial  problem  of  process 
plan  selection  .By  varying  the  weights  of  the  different  objective  function,  different  set  of  optimal  solutions 
were  obtained  which  eases  the  post  design  issues  following  process  plan  selection  in  the  automated 
manufacturing  systems. 


REFERENCES 

1.  A.  Kusiak,  G.Finke,  1998.  Selection  of  Process  Plans  in  Automated  Manufacturing  Systems,  IEEE  J.  of 
Robotics  and  Automation.  4  (4),  397-402. 

2.  K.  Bhaskaran,  1990.  Process  Plan  Selection.  Inter.  J.  of  Production  Research.  28  (8),  1527-1547. 

3.  H.C.  Zhang,  S.H.  Huang,  1994.  A  Fuzzy  Approach  to  Process  Plan  Selection.  Inter.  J.  of  Production 
Research.  32  (6),  1265-1279. 

4.  Y.  Seo,  PJ.  Egbelu,  1996.  Process  Plan  Selection  Based  on  Product  mix  and  Production  Volume.  Inter. 
J.  of  Production  Research.  34  (9),  2369-2655. 

5.  M.  K.  Tiwari,  and  N.  K.  Vidyarthi,  1998.  An  Integrated  Approach  to  Solving  the  Process  Plan  Selection 
problem  in  an  Automated  Manufacturing  System.  Inter.  J.  of  Production  Research.  36  (8),  2167-2184. 

6.  L.  Davis,  1991 .  Handbook  of  Genetic  Algorithm,  Van  Nostrand,  Reinhold. 

7.  D.  E.  Goldberg,  1989.  Genetic  Algorithm  in  Search.  Optimization  and  Machine  Learning,  Addision 
Welsey,  Reading,  MA. 

8.  K.  Dev,  1996.  Optimization  for  Engineering  Design.  Algorithms  and  Examples,  Prentice  Hall,  New 
Delhi,  India. 


285 


Breeding  Policies  in  Evolutionary  Approximation 
of  Optimal  Subspace 

H.M.  Huang  and  P.L.  Leung 

City  University  of  Hong  Kong,  Kowloon,  Hong  Kong,  P.R.China 

ABSTRACT 

In  very  high  dimension  variable  space(e.g.  30  or  more),  huge  computations  evenly  hinder  investigators  to 
conduct  any  direct  meaningful  analysis.  A  traditional  trick  is  firstly  to  conduct  single  variable  analysis,  then 
combinie  several  top  most  single-fittest  variables  to  approximate  the  optimal  subspace.  In  this  investigation, 
an  evolutionary  method[l,2]  for  optimal  subspace  approximation  is  proposed.  The  breeding  policies  of  this 
evolutionary  approximation,  its  scalability  and  generalization  have  been  intensively  investigated.  The 
studied  object  is  a  30-D  variable  space  which  contains  6000  artificial  individuals.  In  this  data,  except  for  3 
variables  containing  two  donut-type  data  distributions,  each  with  3000  individuals,  the  remaining  27 
variables  only  contain  quasi-random  data  with  the  same  value  range  as  the  donut  data  distributions.  The 
donut  distribution  consist  of  two  toroidal  distributions(classes)  which  are  interlocked  like  links  in  a  chain. 
The  cross-section  of  each  distribution  is  a  Gaussian  function  distributed  with  standard  deviation  delta.  Even 
the  Donut  problem  which  possesses  a  variety  of  pathological  traits  can  invalidate  many  non-complex 
analyses  for  classification.  The  goal  of  this  investigation  was  to  find  the  3  donut  variables  within  the  optimal 
subspace  of  30-D  variable  space  in  which  most  quasi-random  variables  emerge  as  noise  variables.  In  order 
to  reach  this  goal,  various  breeding  policies  were  implemented  and  compared.  Although  no  perfect  solution 
for  the  approximation  was  found,  various  breeding  policies  and  their  impact  on  decreasing  the  error  were 
studied.  These  were  found  to  be  relatively  usable  for  reference  and  might  be  improved  when  used  in  a 
practical  application. 


INTRODUCTION 

In  high-dimension  space,  statistical  analysis  of  the  distribution  characteristics  of  variables  is  a  widely 
applied  and  popular  technique.  For  statistical  pattern  classification]!],  different  patterns  are  usually 
discriminated  by  their  respective  statistical  characteristic,  such  as  mean  difference,  variance,  etc..  Before 
any  practical  data  is  fed  to  a  pattern  recognition  system,  plentiful  relative  data  must  be  collected  and 
comprehensive  analyses  must  be  performed  on  the  data.  Among  all  the  techniques  to  design  an  appropriate 
pattern  classifier,  feature  selection  and  extraction  [4]  are  at  the  core. 

Feature  extraction  is  a  technique  to  project  samples  in  high-dimension  space  to  lower  dimensions,  especially 
a  plane.  In  a  plane  or  a  3D  space,  sample  distribution  configurations  can  be  visually  depicted.  So,  feature 
extraction  techniques  are  the  quintessence  for  high-dimension  space  data  visualisation.  When  exerting 
feature  extraction,  the  final  low-dimension  variables  usually  carry  all  the  information  of  each  original 
variable  in  high-dimension  space.  But  for  feature  selection,  only  part  of  the  original  variables  are  kept, 
unrelated  variables  are  just  neglected.  The  approximation  of  optimal  subspace  include  the  meaning  of  both 
feature  extraction  and  selection. 

Genetic  algorithm  is  an  advanced  technique  which  are  applied  in  optimal  problems,  abstract  and  real  object 
recognition,  pattern  classification,  etc.  They  have  received  intensive  attention  in  recent  years.  Different 
phases  such  as  evolutionary  algorithms,  evolutionary  strategies,  evolutionary  programming  have  been 
devised  by  investigator  with  different  backgrounds.  These  phases  have  carried  almost  the  same  meaning  as 
genetic  algorithms  because  they  all  are  bom  from  the  idea  of  biological  evolutionary  theory. 

In  this  paper,  an  evolutionary  approximation  method  for  optimal  subspace  selection  is  coined  from  the 
combined  derivative  of  the  schemes  of  genetic  algorithms  (GA)  and  basic  event  generation  (BEG).  The 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


286 


BEG  method  is  a  non-parametric  pattern  classifier.  This  evolutionary  method  is  applied  to  feature  selection 
for  30D  samples  and  is  used  as  a  classifier  for  2  interlocked  3-d  doughnuts. 

SAMPLE  DISTRIBUTION  CHARACTERISTICS 

The  order  of  space  under  investigation  is  assumed  to  be  30  dimensions.  In  this  paper,  the  samples  are 
artificial  data  which  are  composed  of  6000  individuals.  In  this  data,  3  variables  contain  two  donut  type 
distribution  data  each  containing  3000  individuals,  all  other  27  variables  contain  quasi-random  data  with  the 
same  value  range  as  the  donut  distribution  data.  The  samples  in  the  two  donut  distribution  are  designated  as 
class  A  and  class  B  respectively.  The  mean  centres  of  class  A  and  class  B  are  at  points  MeanA  (3.0,  3.0, 
3.0)  and  MeanB  (4.5,  3.0,  3.0),  but  the  distributions  of  A  and  B  are  centred  on  two  interlocked  circles: 


Distribution  centre  of  class  A 


\(xa  _3.0)2  +  (yA  -3.0)2  =  2.25 
K=3.0 


1 


Distribution  centre  of  class  B 


\(xb  ~4.5)2  +  (zB  -3.0)2  =2.25 
Ns  =3.0 


The  distribution  of  class  A  or  B  is  Gaussian  distributed  with  standard  deviation  delta(c  =  030  ).  The 
distribution  functions  for  samples  of  class  A  and  B  were  respectively: 


/,4  (ft  (*>%*))  = 


OaL 

'ao2 


(r-3.0)2  +^1.5-^/(x-3.0)2  +( v-3.0)2 


2x0.3 


'.in 


=  0.72&e 


y/lnx  0.3 

-5.56^(i-3.0)2  +G-3.0)2  +(z-3.0)2-3»1/(x-3.0)2+(v-3.0)2 


+2.25 


fB(rB(x,y,z))  =  0.728e 


-5.56^(x-4.5)2+G-3.0)2+(z-3.0)2-3v/(x-4.5)2+(r-3.0)2  +2.25  j 


where  ( x,y,z )  is  a  sample  point,  rA(x,y,z)  and  rB(x,y,z)  are  distances  between  point(x, _y,z)  and  the 
points  on  the  distribution  centres  of  class  A  and  B  which  with  minimal  distance  to  ( x,y,z ) . 

The  distance  between  the  means  of  class  A  and  class  B  is  1.5.  Within  the  extension  1.5/2  =  0.75  =  2.5a  , 
these  classes  samples  are  not  overlapped.  By  the  characteristic  of  Gaussian  distribution,  it  is  clear  that  the 
overlapped  rate  is  about  1 .24%,  so  an  ideal  classifier  should  achieve  a  maximum  correct  recognized  rate  of 
98.76%. 


EVOLUTIONARY  STRATEGY 

Suppose  there  exist  N  (  N=6000)  individuals  each  containing  m  (m=30)  variances,  in  which  the  ith 
individual  can  be  expressed  as  Xt  =  (xI(-,X2/,-",xm;),  i  =  1,2, ,  or  more  simply  as 

X;  =  (Xj)m ,i=l,2,---,N  .  Then  the  N (individuals) x  »?(var tables)  data  can  be  expressed  as  a  matrix 

X  =  (Xjj)mxN  .  These  N  sample  evenly  come  from  class  A  and  B,  i.e.,  the  numbers  of  class  A  and  B 

samples  are  N  A  and  N B  ,  N A  =  N B  =  A  . 


For  m=30,  in  order  to  search  optimal  subspaces  for  each  class,  each  variable  should  at  least  be  divided 
into  two  parts,  so  the  number  of  subspace  in  one  dimension  is  at  least  3,  then  the  number  of  probable 
subspaces  for  30  variables  is  at  least  330  =  2.1  xlO14  .  Clearly,  an  exhaustive  strategy  is  impossible.  The 


287 


proposed  evolutionary  approximation  method  of  the  optimal  subspace  is  intended  to  circumvent  this  highly 
computational  requirement.  A  traditional  trick  is  to  first  conducting  single  variable  analysis,  then  combine 
several  top-most  single-fittest  variables  to  approximate  an  optimal  subspace.  This  strategy  sometimes  may 
achieve  a  solution,  but  more  often,  becomes  trapped  in  local  optimum. 

This  algorithm  generates  subspaces  populations  according  to  the  scheme  of  BEG,  and  approximates  optimal 
subspaces  by  the  scheme  of  GA. 

1 .  Random  generation  of  initial  subspace  population  G0  :  The  initial  generation  or  seed  generation 

which  includes  P0  individuals  that  represent  P0  subspaces,  denoted  as  7|,  72,  /,-,•••,  7^  ,  where 

Ij  is  a  triad  of  three  measures  (Variable  selector(SL),  Centre  of  subspace(CT)  and  Suburb  of 
subspace(SU)).  Variable  selector  SL  is  a  binary  (0-1)  string,  SLt  =  (b}  />2  •  ■  •  •  •  bm)t ,  where  m  is  the 

number  of  variables  and  bj  =  0  or  1  .  If  bj  *  0 ,  the  j-th  variable  is  included  as  a  component  of  the 

i-th  subspace.  The  Centre  of  subspace,  CT,  is  a  high-order  point  in  the  subspace  under  consideration 

(order  <=  m  ).  The  Suburb  of  subspace,  SU,  is  a  range  for  each  valid  variable.  The  subspace  /,•  can 
represent  any  high-dimension  simple  continuum,  such  as  super-cuboid,  super-ellipsoid,  etc.  In  this 
work,  only  super-cuboid  is  considered. 

2.  Approximation  of  optimal  subspace:  In  the  seed  generation  Gq  and  any  subsequent  generation 
G/,l  =  0,1,2,- •,  the  subspace /,- can  be  called  a  classifier  since  just  these  evolutionaiy  subspaces 
classify  sample  space  into  an  appropriate  class.  In  addition  to  the  three  measures  for  subspace/,- ,  there 
exist  two  properties  for  7, ,  the  correct  rate  and  error  rate.  The  evolutionary  strategy  is: 

(a)  At  G),l  =  0,1,2,  ”,  judge  the  validation  of  each  subspace  7,  and  calculate  a  correct  rate  and 
error  rate  for  each  validate  subspace  /,• .  Suppose  7,  contains  a  total  of  Qt  samples  in  which 
QjA  samples  come  from  class  A  and  QiB  =  Qi  -  QiA  samples  come  from  class  B,  if 
Qi  >  g.,x0  =0.05)  which  means  subspace  7,  contains  at  least  Xq  (e.g.,  5%)  of  the  total 

samples,  subspace  7,  is  a  valid  candidate  as  an  optimal  subspace.  If  7,  is  a  valid  candidate 
subspace  and  if Qia  >  TiQiB>  (e.g.,Xj  =  10),  /,•  is  a  valid  subspace  whose  correct  rate 
is  QiA  /  Na  and  error  rate  is  QiB  /  NB;  else  if  QiB  >  TjQiA ,  /,•  is  also  a  valid  subspace  whose 
correct  rate  is  QiB  /  N B  and  error  rate  is  QiA  /  NA ;  otherwise,  It  should  be  excluded  as  a  valid 
subspace.  In  subsequent  evolutions,  we  can  validate  subspaces  only  by  one  of  three  GA  operators: 
reproduction,  crossover  and  mutation.  The  invalid  subspaces  operate  only  by  mutation. 

(b)  For  a  subspace  /,• ,  the  reproduction  operator  does  nothing  on  /,■ ,  only  copies  and  transfers  it  to 
the  next  generation.  The  crossover  operator  for  It  firstly  selects  another  subspace 
I j  (j  =  1,2 ,-”F0,y  *■  0  from  the  same  generation  and  then  randomly  selects  the  crossover  point 
k(k  =  l,2,‘--m),  next  it  divides  7( ,  Ij  at  point  £  and  then  crosses  over  these  four  sub-strings  to 

formulate  two  new  offspring  1^,1^  ,  finally  it  replaces  the  old  7,- ,  I  j  by  new  7,-1,7yl  in  the  next 
generation.  The  mutation  operator  for  7,-  is  somewhat  more  complex  since  it  applies  to  both 
valid  and  invalid  subspaces.  For  valid  subspaces,  the  mutation  operator  enlarges  or  shrinks  the 
suburb  of  the  subspace  SU;  for  invalid  subspaces,  the  mutation  operator  used  one  of  four  sub¬ 
operators,  (i)one-bit  mutation  in  variable  selector  (SL),  (ii)one  co-ordinate  shift  in  the  centre  of 
subspace  CT,  (iii)  one-co-ordinate  enlargement  or  shrinkage  in  the  suburb  of  subspace  SU  or  (iv) 
random  initialization  of  a  new  subspace. 


288 


(c)  Determination  of  evolutionary  termination  condition.  Because  samples  in  class  A  and  B  are 
distributed  as  two  interlocked  doughnuts  and  each  evolutionary  subspace  is  a  simple  continuum,  it 
cant  be  counted  on  to  classify  class  A  and  B  perfectly  using  any  single  evolutionary  subspace.  So 
the  overall  evaluation  of  all  valid  subspaces  should  consider  the  determination  of  the  evolutionary 
termination  condition.  At  the  learning  stage,  suppose  for  a  sample  Xh  J  A  subspaces  (i.e., 

classifiers)  classify  it  as  class  A,  and  J  B  classifiers  classify  it  as  class  B,  then  if  J A  >  t3  Jb  ,  Xt 
is  classified  as  class  A  and  if  JB>x3JA,  Jf(is  classified  as  class  B,  otherwise  Xt  is 
indeterminate.  So  that  the  overall  correct  rate,  error  rate  and  unknown  rate  are  now  available. 
When  the  overall  recognition  rate  (  =  overall  correct  rate  -  overall  error  rate)  reach  a  predefined 
value,  the  evolutionary  process  terminates. 

At  different  stages  of  evolution,  selection  of  a  breeding  policy  can  significantly  affect  the  final  outcome. 


RESULTS  AND  DISCUSSION 

Various  evolutionary  breeding  policies  were  assumed  and  implemented.  The  resulting  total  fitness  (correct 
recognition  rate,  %)  and  the  number  of  variables  in  the  optimal  subspace  were  intensively  correlated  with 
the  selected  breeding  policy,  such  as  the  selection  of  initial  population,  determination  of  classifier 
resolution,  formation  of  mutation  operator  and  crossover  operator  etc.  For  example,  if  the  resolution  of  a 
classifier  (minimal  sample  number  to  judge  the  validation  of  a  classifier)  is  too  small,  the  resulting  number 
of  variable  in  the  optimal  subspace  will  oscillate  far  away  from  the  theoretical-known  value,  otherwise  if  it 
is  too  large,  the  analysis  will  achieve  nothing! 

The  evolutionary  process  up  to  the  2784*  epoch  for  total  fitness  and  number  of  variables  in  the  optimal 
subspace  are  illustrated  in  Fig.l  and  Fig.2.  Although  the  theoretical  optimal  fitness  (98.76%)  hasnt  been 
reached  even  after  2784  epochs,  from  a  start  of  less  than  1 0  epochs,  the  number  of  variable  in  optimal 
subspace  is  very  stable  at  3,  with  rare  exceptions  of  2  and  4.  The  number  3  represents  the  doughnut 
dimensions  which  cany  sufficient  information  to  discriminate  the  two  3D  doughnuts. 


0  1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 _ i _ i _ i _ i _ i 

0  500  1000  1500  2000  2500  3000 

Epoch  of  evolution 

Fig.l.  Total  fitness  (correct  recognition  rate,  %)  of  distributed  interlocked  donuts 


289 


0  500  1000  1500  2000  2500  3000 


Epoch  of  evolution 


Fig.2.  Number  of  variables  in  optimal  subspace 

In  practical  applications  of  high-dimension  feature  extraction  and  selection,  such  as  complex  medical  data 
analysis,  abstract  object  recognition,  high-dimension  variables  and  huge  computation  may  hinder  complete 
searching  of  the  solution  space.  Traditional  technique  often  reach  sub-optimal  solutions.  By  careful  design 
and  prudent  selection  of  evolutionary  breeding  policies,  evolutionary  scheme  may  perfectly  approximate  the 
optimal  subspace. 


REFERENCES 

1.  Thomas  Back,  1996.  Evolutionary  algorithms  in  theory  and  practice:  evolution  strategies,  evolutionary 
programming,  genetic  algorithms  ,  Oxford  University  Press,  New  York. 

2.  Michalewicz  and  Zbigniew,  1996.  Genetic  algorithms  +  data  structures  =  evolution  programs,  Springer- 
Verlag,  New  York,. 

3.  Robert  Schalkoff,  1992.  Pattern  Recognition — Statistical,  Structural  and  Neural  Approaches,  John 
Wiley  &  Sons,  Inc. 

4.  Keinosuke  Fukunaga,  1990.  Introduction  to  Pattern  Recognition,  2nd  Edition,  Academic  Press,. 

5.  Perambur  S.  Neelakanta,  Dolores  F.  De  Groff,  1994.  Neural  Network  Modeling — Statistical  Mechanics 
and  Cybernetic  Perspectives,  CRC  Press,  Inc. 

6.  Clifford  Lau(edited  by), 1992.  Neural  networks — Theoretical  Foundations  and  Analysis,  IEEE  Press, 

7.  Brian  D.  Ripley,  1996.  University  of  Oxford),  Pattern  Recognition  and  Neural  Networks,  Cambridge 
University  Press,  Great  Britain,, 

8.  Paolo  Antognetti  and  Veljko  Milutinovic,  Eds.,  1991.  Neural  Networks,  Concepts,  Applications  and 
Implementations,  Vol.  I,  Prentice  Hall  Inc.,  Simon  &  Schuster,  Englewood  Cliffs,  New  Jersey,  USA. 


290 


291 


Prediction  of  Cement  Paste  Mechanical  Behaviour 
From  Chemical  Compositon  using  Genetic  Algorithms 
and  Artificial  Neural  Networks 


Jose  C.  Cassa,  Giovanni  Floridia,  Andre  R.  Souza,  Rodrigo  T.  Oliveira 


Grupo  de  Estudos  em  Inteligencia  Computacional  Aplicada  (GEICAP) 
DTCM  -  Escola  Politecnica  -  Universidade  Federal  da  Bahia 
Rua  Aristides  Novis,  02  -  Federa^ao,  40210-630  Salvador,  Bahia,  Brazil 

e-mail:  iccassa@ufba.br 


ABSTRACT 

Computational  Intelligence  (Cl)  techniques  have  attracted  the  interest  of  some  engineers  as  valid  tools  for  the 
representation  of  complex  systems.  A  growing  number  of  works  are  showing  that  they  are  also  effective  in 
optimisation.  Building  materials,  such  as  concrete  and  mortars,  usually  display  a  complex  behaviour  hard  to 
model  and  seems  to  be  an  interesting  area  to  explore  the  application  of  Cl  as  modelling  technique.  This  paper 
describes  how  Cl  can  be  used  to  model  the  performance  of  cement  paste.  The  specific  objective  was  to 
develop  models  able  to  predict  the  mechanical  behaviour  of  this  material  using  only  data  available  from 
chemical  composition  of  cement.  The  developed  models  showed  the  advantage  of  Cl  with  respect  to 
conventional  techniques  leading  rapidly  to  useful  results  with  reasonable  precision  and  accuracy. 


INTRODUCTION 

The  technology  of  cement  has  had  significant  advances  in  the  last  two  centuries.  The  success  of  engineering 
techniques  using  intensively  this  material  is  a  clear  demonstration  of  the  achieved  progress  as  well  as  an  alert  to 
the  building  construction  industry  for  the  need  of  new  developments.  Although  the  commercial  value  of  this 
technology  is  huge,  the  scientific  basis  today  are  far  from  complete  and  it  is  possible  to  identify  spaces  and 
opportunities  for  new  scientific  developments. 

In  other  work  [1]  the  authors  showed  the  validity  of  the  use  of  artificial  neural  networks  (ANN)  as 
modelling  technique  to  predict  the  concrete  mechanical  behaviour.  Three  different  neural  network  main 
architectures  were  used:  the  Multi-Layer  Perceptron  (MLP),  the  General  Regression  Neural  Network 
(GRNN)  and  the  polynomial  network  known  as  Group  Method  of  Data  Handling  (GMDH).  The  key 
question  is  how  to  choose  the  best  ANN  architecture. 

Even  in  a  known  architecture  like  MLP  there  are  many  possibilities  giving  different  MLP  structures:  one 
can  change  the  number  of  layers,  the  transfer  function  of  each  neuron,  and  the  connections  between 
neurons.  If  one  allow  also  to  ‘jump”  a  layer  and  freely  connect  neurons,  the  MLP  name  itself  makes  little 
sense,  because  there  are  no  more  clear  layers.  The  compromise  can  be  to  use  a  modified  MLP  that  accepts 
different  ‘Slabs’’  in  the  same  layer  [2,3].  There  are  many  possibilities  to  choose  the  best  neural  network 
architecture.  In  order  to  automate  this  search  for  best  network  configuration  genetic  algorithms  can  be  used. 
This  work  shows  a  simple  application  of  this  technique. 


COMPUTATIONAL  INTELLIGENCE 

In  the  last  decades  the  great  majority  of  works  on  modelling  using  computational  intelligence  (Cl)  is 
characterised  mainly  by  the  search  of  methods  inspired  in  nature  where  a  large  number  of  live  systems  can 
be  regarded  as  intelligent.  Although  it  is  not  possible  to  assure  that  all  solutions  in  nature  are  optimised 
and/or  intelligent  there  is  no  doubt  that  they  are  well  designed  and  suitable  to  survive  in  their  environment. 
For  this  reason  some  of  these  systems  have  been  adopted  as  new  paradigms  for  engineers,  scientists,  and 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


292 


mathematicians  in  order  to  model,  simulate  and/or  control  complex  systems.  These  professionals  search 
new  ideas  that  can  be  copied  or  imitated  in  a  computer  and  then  applied  for  the  solution  of  hard  problems 
that  science  and  engineering  cannot  yet  solve  satisfactorily  using  classical  mathematical  approach 
(phenomenological  or  others).  For  example,  the  central  nervous  systems  of  superior  animals  and  the  natural 
evolution  processes  gave  the  inspiration  for  the  development  of  new  Cl  techniques  called  artificial  neural 
networks  [2,3]  and  genetic  algorithms  [4],  respectively. 

Artificial  Neural  Networks 

Many  different  ANN  structures  and  training  techniques  are  described  in  the  literature  [3],  but  multi-layer 
perceptrons  (MLP)  seem  suitable  to  model  complex  systems.  In  simple  terms,  this  type  of  ANN  works  like  a 
non-linear  regression  approach.  As  with  any  curve-fitting  technique,  model  parameters  can  be  found  by  least 
squares  optimisation.  With  multi-layer  perceptron  networks,  processing  elements  are  arranged  into  inter¬ 
connected  layers.  There  is  an  input  layer  to  receive  data,  one  or  more  hidden  layers,  and  finally,  an  output  layer 
to  transmit  the  result  of  network  calculations.  The  hidden  layer  is  essential  to  represent  non-linear  processes. 

Although  MLP  neural  networks  can  solve  many  modelling  problems,  other  ANN  architectures  [3]  such  as 
General  Regression  Neural  Networks  (GRNN),  Probabilistic  Neural  Networks  (PNN),  and  Polynomial 
Networks  (GMDH)  can  often  do  the  same  job  easier  [1]. 

The  ANN  designer  must  specify  the  number  of  hidden  layers  (usually  one),  the  number  of  nodes  in  each  layer, 
and  the  transfer  function  of  each  layer.  This  is  called  the  ANN  "architecture".  Expertise,  judgement  and  trial- 
and-error  as  in  regression  methods  are  used  to  define  the  suitable  architecture.  Once  the  ANN  architecture  is 
selected,  the  weights  have  to  be  adjusted  by  finding  the  values  that  minimise  for  example  the  sum  of  square 
deviations  of  the  ANN  and  process  outputs. 

The  main  problem  for  engineers  and  scientists  unfamiliar  with  non-linear  modelling  is  to  apply  and  understand 
ANN  concepts  that  require  reasonable  investment  in  software,  training  and  developing  the  heuristics  of  neural 
programming.  For  many  end  users  this  situation  looks  impracticable.  So  it  would  be  very  useful  if  an  automatic 
computer  system  could  represent  the  knowledge  of  a  neural  programmer  in  order  to  find  acceptable  ANN 
solutions.  Recent  progress  in  Evolutionary  Computing  made  it  possible  through  Genetic  Algorithms  [4], 

Genetic  Algorithms 

Genetic  Algorithms  (GAs)  are  search  algorithms  based  on  the  mechanics  of  natural  selection  and  natural 
genetics  [4].  They  combine  survival  of  the  fittest  among  string  structures  with  a  structured  but  randomised 
information  exchange  to  form  a  search  algorithm  with  some  characteristics  of  human  search. .  It  is  necessary  to 
represent  any  possible  solution  as  a  string  of  bits  called  "chromosome"  in  a  clear  metaphor  of  the  in  biological 
counterpart.  One  also  calls  these  strings  "individuals"  or  "artificial  creatures".  There  is  a  clear  coding  schema  in 
this  process  and  each  "creature"  is  a  candidate  solution  under  that  schema.  The  procedure  is  to  create  a 
population  of  candidates  that  will  compete:  well-adapted  individuals  will  grow  in  number,  bad  ones  will  vanish. 

In  every  generation,  a  new  set  of  artificial  creatures  (strings)  is  generated  using  bits  and  pieces  of  the  fittest  of 
the  old  and  occasionally  new  parts  are  tried  for  good  measure.  The  efficiency  exploit  historical  information  to 
speculate  on  new  search  points  with  expected  improved  performance. 

GAs  are  different  from  more  normal  optimisation  and  search  procedures  in  four  ways,  they: 

a)  work  with  a  coding  of  the  parameters,  not  the  parameter  themselves 

b)  search  from  a  population  of  points,  not  a  single  point 

c)  use  objective  function  information,  not  derivatives 

d)  use  probabilistic  transition  rules,  not  deterministic  rules. 

A  simple  GA  that  yields  good  results  in  many  practical  problems  is  composed  of  three  operators:  reproduction, 
crossover  and  mutation. 

Reproduction  is  a  process  in  which  individual  strings  are  copied  according  to  their  objective  function  values 
(the  fitness  function).  This  function  f  can  be  some  measure  of  profit,  utility  or  goodness  that  we  want  to 


293 


maximise  or  minimise.  Copying  strings  according  to  their  fitness  values  means  that  strings  with  a  higher 
probability  of  contributing  to  one  or  more  offspring  will  be  kept  in  the  next  generation.  This  operator  is  an 
artificial  version  of  natural  selection,  the  survival  of  the  fittest  among  string  creatures.  How  crossover  and 
mutation  operators  work  will  be  illustrated  in  the  following  example.  A  classic  problem  of  model 
optimisation  is  the  search  for  the  best  regression  equation  to  represent  a  system.  As  an  example,  suppose 
that  a  model  relates  an  output  z  to  two  inputs  x  and  y,  so  that 

z  =  F(x,y)  1. 

where  F  is  any  linear  or  non-linear  function  or  transformation  of  original  variables  x  and  y  that  can  be  useful  in 
the  model,  such  as: 

z  =  G(x,  y,  x2,  y2,  x3,  y3,  xy, ...)  2. 

where  G  is  a  linear  weighting  of  the  terms  (in  this  case  we  are  seeking  a  model  that  is  linear  in  the  parameters). 
Once  all  possible  terms  in  the  model  are  chosen  it  is  possible  to  produce  a  random  population  of  models  in 
binaiy  form  where  1  indicates  inclusion  of  the  corresponding  term  and  0  indicates  that  the  term  is  not  used.  For 
instance,  the  following  strings  correspond  to  three  members  in  the  population  derived  from  terms  inEq.  2.: 

1101001  =>  x,  y,  y2andxy 
0110010  =>  y,  x2  and  y3 
1110010  =>  x,  y,  x2andy3 

Each  of  these  models  can  be  fit  to  part  of  a  data  set  by  least  squares  and  the  ability  of  the  model's  predictive  can 
be  tested  with  the  remaining  data.  The  models  are  then  ranked  based  on  their  prediction  error  (some  criterion  is 
adopted,  i.e.,  R2  or  MSE). 

It  is  possible  to  create  a  GA  strategy  where  half  of  the  population  of  models  is  allowed  to  live  and  breed  at  each 
step  (generation).  Pairs  of  these  models  are  randomly  selected  for  breeding.  A  randomly  selected  crossover 
point  is  chosen  for  each  pair  of  models  and  genes  are  twisted.  Suppose  that  the  first  and  second  model  were 
chosen  for  breeding  and  the  crossover  point  selected  as  indicated.  The  result  would  be 

1101001  =>  1101010 
0110010  =>  0110001 

where  two  new  members  of  the  population  are  shown  at  right.  Other  breeding  schemes  can  be  adopted[5]. 

A  similar  process  used  by  the  crossover  operator  is  used  for  the  mutation  operator.  This  operator  is  applied  to 
some  selected  individual  that  belongs  to  the  new  generation  that  can  be  the  result  of  the  crossover  operator 
application.  A  random  number  either  0  and  1  is  generated  and  at  the  same  time  it  is  set  the  position  of  a  bit.  If 
the  mutation  probability  is  greater  than  the  random  number  generated,  the  value  of  bit  change  (from  0  to  1  or 
from  1  to  0).  This  process  is  made  until  the  end  of  the  bit  strings.  As  example,  a  mutation  operator  in  the  third 
member  of  population  described  before  operating  as  indicating 

1110010  =>  1100110 

The  overall  process  stops  when  there  is  no  more  improvement  after  several  generations.  Then,  it  is  assumed 
that  optimum  conditions  were  reached.  Using  this  approach  is  possible  to  define  a  search  algorithm  based  on 
GAs  in  order  to  optimise  a  defined  objective  function  f  =  f(s),  such  as  ANN  architecture  (the  type  of  ANN 
architecture,  the  number  of  hidden  layers,  the  number  of  neurons  in  the  hidden  layer,  the  most  suitable  transfer 
function  in  each  layer  and  the  weights/connections  between  neurons)  by  using  suitable  objective  functions. 

THE  CONSIDERED  PROBLEM 

Predicting  the  flexural  mechanical  strength  of  cement  pastes 

Several  researchers  attempted  to  predict  mechanical  properties  of  cement  paste  and/or  concrete  from  the 
chemical  composition.  Unfortunately,  the  results  were  not  encouraging  and  the  mathematical  models  developed 
did  not  show  the  reliability  and  precision  required  in  this  type  of  work. 


294 


Dragicevic  and  Rsumovic  [6]  using  D-Optimal  Design  and  regression  analysis  proposed  a  statistical  model  for 
this  purpose.  But,  primary  mistakes  of  overfitting  (perfect  adjusting  due  to  zero  degree  of  freedom)  invalidate 
any  usefulness  of  the  developed  model  as  it  has  no  ability  of  generalisation  (the  model  was  only  educated  to 
repeat  the  original  training  set  of  data).  Although,  there  are  some  mistakes  in  the  modelling  approach  of  the 
paper,  the  experimental  results  are  believed  to  be  consistent  and  reliable,  and  were  used  in  this  paper  to 
demonstrate  the  application  of  computational  intelligence. 

In  the  original  work,  one  area  of  interest  was  defined  where  four  types  of  cements  were  used  (zl ,  z2,  z3  and  z4) 
and  the  mineralogical  composition  estimated  from  the  chemical  composition  of  cements  (Bogue  formulae),  as 
shown  in  Table  1. 


Table  1:  Potentia 


mineralogical  composition  of  cements  used. 


zl 

z2 

z3 

z4 

X1-C,S 

73.04 

40.95 

73.73 

53.34 

X2-C2S 

6.89 

41.05 

7.43 

21.04 

x3-C3A 

10.05 

4.82 

5.89 

13.59 

X4-C4AF 

10.02 

13.18 

12.95 

12.03 

An  experimental  data  base  relating  different  mixtures  of  cements  (zl,  z2,  z3  and  z4)  and  flexural  mechanical 
strength  of  cement  pastes  at  the  28th  day  of  age  was  developed  by  using  a  design  of  experiments  technique 
called  "D-Optimal  Design"  from  information  extracted  from  [6]. 

The  value  of  potential  mineralogical  composition  of  another  cement  (xl,  x2,  x3  and  x4)  can  be  calculated  from 
its  composition  or  measured  directly  using  X-ray  techniques,  microscopy  or  other  quantitative  mineralogical 
method.  By  using  matrix  algebra,  the  values  of  potential  mineralogical  composition  (xl,  x2,  x3  and  x4)  can  be 
converted  into  equivalent  mixtures  of  cement  used  in  this  study  (zl ,  z2,  z3  and  z4)  from  the  equations: 

zl=  5.517x1  +5.600x2  +0.484x3  -34.660x4  4.1. 

z2=  1.300x1  +4.051x2  -0.627x3  -7.623x4  4.2. 

z3=- 2.295x1  -5.481x2  -5.811x3  +26.324x4  4.3. 

z4=-  3.523x1  -  3.203x2  +  10.921x3  +  16.926x4  4.4. 

The  major  objective  is  to  develop  a  prediction  model  based  on  ANN  that  allows  the  calculation  of  flexural 
mechanical  strength  (FS)  at  28th  day  from  the  knowledge  of  its  chemical  /  mineralogical  composition,  so  that 

FS  =  f(xl,  x2,  x3,  x4)  5. 


TRAINING  THE  NETWORK 

Using  Backpropagation 

The  architecture  of  a  Multi-Layer  Perceptron  (MLP)  has  to  be  adapted  to  the  problem  to  be  solved.  The 
number  of  network  inputs  to  the  network  is  constrained  by  the  problem,  and  the  number  of  neurons  in  the 
output  layer  is  constrained  by  the  number  of  outputs  required  by  the  problem.  Backpropagation  (BP)  training 
follows  gradient  descent  on  the  error  surface  to  minimise  network  error.  Local  minima  may  trap  the  network. 
The  more  neurons  in  intermediate  layers  the  more  freedom  a  network  has  and  more  variables  to  optimise. 
Hecht-Nielsen  demonstrated  that  a  hidden  layer  with  (2n  +  1)  neurons,  where  n  is  the  number  of  input 
variables,  can  represent  any  mathematical  continuous  function  [2], 

The  best  network  scored  2,77%  MSE,  using  a  simple  full-connected  MLP.  Often  in  polynomial  regression  the 
best  generalisation  can  be  achieved  with  not  full  polynomials.  In  ANN  regression  we  can  have  a  similar 
behaviour:  best  generalisation  can  be  achieved  with  not  full-connected  MLP.  Genetic  Algorithms  can  be  a 
good  tool  to  manage  the  difficulty  to  find  such  architectures 


295 


Using  Genetic  Algorithms 

It  is  quite  obvious  that  training  an  MLP  network  using  backpropagation  is  time-consuming  even  with  fast 
computers  as  well  as  requiring  some  heuristic  knowledge  of  neural  programming.  GAs  offer  the  possibility 
of  automatic  searching  to  ease  the  work  of  finding  an  acceptable  ANN  solution.  NeuroGenetic  Optimizer 
(NGO)  [7]  is  a  commercial  software  based  on  this  principle.  The  NGO  process  consists  of: 

a)  Creating  an  initial  population  of  genotypes 

(genetic  representations  of  the  neural  networks  able  to  solve  the  problem  of  interest) 

b)  Building  neural  networks  (phenotypes)  based  on  the  genotypes 

c)  Training  and  testing  the  neural  networks  to  determine  how  fit  they  are 

d)  Comparing  the  fitness  of  the  networks  and  keeping  the  best  (Top  10) 

e)  Selecting  those  networks  in  the  population  that  are  better,  discarding  those  which  are  not 

f)  Refilling  the  population  back  to  the  defined  size 

g)  Pairing  up  the  genotypes  of  the  neural  networks 

h)  Mating  the  genotypes  by  exchanging  genes  (features)  of  the  networks 

i)  Mutating  the  genotypes  in  some  random  fashion. 

Then  returning  back  to  step  b)  and  continuing  this  process  until  some  stopping  criteria  is  reached  or  the 
process  is  stopped  manually. 

Through  this  process,  the  better  networks  survive  and  their  features  carried  forward  into  future  generations 
and/or  combined  with  others  to  find  better  networks  for  the  desired  application.  This  genetic  search  capability 
is  much  more  effective  than  random  searching,  as  the  genetic  process  of  recombining  features  vastly  improves 
the  speed  of  identifying  highly  fit  networks.  It  also  has  a  potential  advantage  over  just  using  personal 
experience  in  building  neural  networks,  as  new  and  potentially  better  solutions  may  be  found  through  this 
process  than  might  be  found  using  the  nearly  unavoidable  assumptions  made  by  the  user. 

Neural  network  fitness  is  computed  by  the  application  of  the  neural  node  influence  (NNI)  factors  and 
optionally,  Learning  Ability  Compensation  to  the  networks  test  accuracy.  When  using  NNI,  the  networks 
test  accuracy  is  adjusted  by  this  factor  according  to  the  equation: 

Net_Fitness  =  Test  Accuracy  *  (1  +/-  0.5  *((Input_Node_Influence*(l  -Nbr_lnputs  /  Nbr_Possible_Inputs)) 

+  (Hidden_Node_Influence*(l-Nbr_Hiddens  /Max  Hiddens))  ))  6. 

Accordingly,  smaller  networks  are  rewarded  by  an  amount  proportional  to  the  input  and  hidden  node 
influence.  The  NGO  formula  for  Learning  Ability  Compensation  is  proprietary  and  confidential. 

General  Regression  Neural  Networks 

General  regression  neural  networks  (GRNN)  are  feedforward  networks  based  in  probability  density 
functions.  GRNN  train  fast  showing  good  performance  provided  enough  experimental  data  are  available. 
This  network  was  developed  in  the  statistical  literature  as  kernel  regression  and  rediscovered  later  as  a  new 
ANN  architecture  [11],  Its  topology  consists  of  four  layers:  the  input  layer,  a  hidden  layer  working  as 
classifier,  addition  neurons  and  the  output  layer.  Training  is  processed  in  one  step  when  the  training  set  is 
copied  in  the  hidden  layer.  The  addition  neurons  process  the  kernel  function.  The  network  approximates 
any  new  input  to  the  nearest  one  available  in  the  classifier  and  then  presents  the  output  response.  The 
weights  are  a  smoothness  factor  that  can  be  trained  and  calibrated  using  GAs. 

Polynomial  Neural  Networks 

Polynomial  neural  networks  also  called  Group  Method  of  Data  Handling  (GMDH)  was  invented  by 
Ivakhnenko  in  Russia  but  later  used  as  neural  networks  and  enhanced  by  others. 

GMDH  works  by  building  successive  layers  with  complex  links  (or  connections)  that  are  individual  terms  of 
a  polynomial.  These  polynomial  terms  are  created  by  using  linear  or  non-linear  regression.  The  initial  layer 


296 


is  simply  the  input  layer.  The  first  layer  created  is  made  by  computing  regressions  of  the  input  variables  and 
then  choosing  the  best  ones.  The  second  layer  is  created  by  computing  regressions  of  the  values  in  the  first 
layer  along  with  the  input  variables.  Once  again,  the  best  are  chosen  using  a  convenient  algorithm  (i.e., 
GAs).  They  are  called  survivors.  This  process  continues  until  the  network  stops  getting  better  (according  to 
a  specified  criterion). 

The  resulting  network  can  be  represented  as  a  complex  polynomial  description  of  the  model.  In  some 
respects,  it  is  like  using  regression  analysis  but  is  far  more  powerful.  GMDH  can  build  very  complex 
models  while  avoiding  overfitting  problems. 

NEUROSHELL  GMDH  [11]  can  recognise  the  most  significant  variables  as  it  trains,  and  will  display  a  list 
of  them.  This  software  has  also  facilities  to  select  the  degree  of  expected  model  non-linearity  (off,  low, 
medium  and  high)  as  well  as  the  model  diversity  by  using  the  maximum  number  of  survivors  (low,  medium 
and  high).  The  length  of  the  model  is  associated  to  complexity  (low,  medium  and  high).  The  final  model 
can  be  also  optimised  eliminating  the  less  significant  parameters. 


RESULTS 

Different  types  of  ANN  architecture  were  performed  using  MLP  combined  with  GAs.  The  idea  was  to 
investigate  all  possible  combinations  such  as  the  number  of  hidden  layers,  the  number  of  neurons  in  each 
layer,  and  the  type  of  transfer  functions  in  each  neuron  (including  free  allocation  of  these  transfer  functions 
in  any  neuron).  The  best  combination  for  the  considered  problem  was  a  network  with  one  hidden  layer 
(twelve  logistic  functions)  and  one  output  layer  (logistic  function)  with  MSE  =  1 ,59%.  It  can  be  seen  that 
only  small  improvements  were  obtained  indicating  the  goodness  of  traditional  approach  (simple  MLP). 

Other  more  advanced  networks  suitable  to  represent  the  studied  problems  were  also  considered  giving  the 
results  summarised  in  Table  2.  These  networks  correspond  to  the  best  architecture  found  for  each  case 
considered.  This  application  was  developed  using  MATLAB  [9,10],  NEUROGENETIC  OPTIMIZER  [7] 
and  NEUROSHELL  [11],  Two  types  of  ANN  gave  good  results:  GRNN  and  GMDH.  GRNN  works  as 
interpolating  polynomials  where  the  training  set  is  copied  in  the  hidden  layer.  This  type  of  network  will  lead 
to  satisfactory  results  when  enough  reliable  data  are  available.  Table  3  shows  the  main  results  for  GMDH 
networks.  It  is  interesting  to  note  that  this  type  of  ANN  looks  as  regression  models  that  are  familiar  to 
researcher  and  engineers.  The  complexity  necessary  in  any  case  will  depend  on  the  desired  precision  and 
accuracy.  In  the  studied  cases,  even  simple  models  gave  good  results. 


Table  2:  Summary  of  results  of  best  fitting  some  ANN. 


MODEL 

MSE  (%) 

MLP  simple 

2,77 

MLP  /GAs 

1,59 

GRNN  /  GAs 

3,21 

GMDH  simple  /  GAs 

2,27 

GMDH  interm.  /  GAs 

2,17 

GMDH  complex  /  GAs 

1,95 

The  final  evaluation  of  the  usefulness  of  the  systems  developed  can  be  assessed  in  figures  1  to  4  when 
calculated  and  experimental  values  are  compared  for  the  considered  problems.  It  can  be  seen  that  in  all 
cases  good  agreement  was  found  indicating  reasonable  solutions  for  all  proposed  models. 


297 


_ _ Table  3:  Main  results  for  GMDH  networks. _ 

Normalisation  of  input  /  output  values 
Xl=2.*(xl  -.41  )/.33-l . 

X2=2.*(x2-.07)/.34-l. 

X3=2.*(x3-.05)/.09-l . 

X4=2 .  *  (x4- .  1  )/.03  - 1 . 

Y  =2.*(  yexP  -7.)/2.-l . 

GMDH  simple: 

Y=  0.61  +  1.8*X1  +  1.3*X2  -  0.65*X1A2  +  2.2*X1*X3  +  2*X2*X3 

GMDH  intermediate: 

Y=  0.33*X3  +  2.2*X1  +  2*X2  +  0.64  +  2.1*X2A2  +  0.36*X3A2  +  2.6*X1*X2  +  3.7*X1*X3  +  4*X2*X3  + 
0.44*X1A2  +  0.39*X1*X4  -  0.46*X1A3  +  0.15*X1*X4A2  -  0.42*X1A2*X4 

GMDH  complex: 

Y=  2.1*X1  +  0.58  +  1.9*X2  +  0.33*X3  +  2.1*X2A2  +  0.35*X3A2  +  2.6*X1*X2  +  3.7*X1  *X3  + 
3.9*X2*X3  +  0.95*X1A2  +  0.46*X1*X4  -  0.49*X1A3  +  0.21*X1*X4A2  -  0.45*X1A4  -  0.47*X1A2*X4 


Fig.  1.  Calculated  vs.  Experimental  Data  for  best  MLP  /  GAs  (MSE  -  1,59  %). 


Fig.  2.  Calculated  vs.  Experimental  Data  for  best  GRNN  (MSE  -  3,21%). 


CONCLUSIONS 

The  analysed  problem  shows  the  applicability  of  artificial  neural  networks  combined  with  genetic 
algorithms  as  modelling  tool  for  prediction  of  concrete  mechanical  behaviour.  It  is  hoped  that  this  paper 
will  encourage  other  applications  of  computational  intelligence  for  the  prediction  of  mechanical 
performance  of  other  complex  materials,  specially  those  materials  that  do  not  have  any  model  available  or 
the  traditional  models  do  not  offer  the  desired  precision. 


298 


The  combination  of  artificial  neural  networks  and  genetic  algorithms  is  extremely  powerful  as  modelling 
technique  for  complex  hard  problems.  The  automation  of  model  developing  process  makes  the  application 
quite  simple  even  for  end  users  not  familiar  with  neural  programming  and  non-linear  modelling. 


Fig.  3.  Calculated  vs.  Experimental  Data  for  best  simple  GMDH  (MSE  =  2,27  %). 


REFERENCES 

1.  J.  C.  S.  Cassa,  et  al.,  1998.  Prediction  of  Concrete  Mechanical  Behaviour  from  Data  at  Lower  Ages 
Using  Artificial  Neural  Networks.  IPMM  99  -  The  Second  International  Conference  on  Intelligent 
Processing  and  Manufacturing  of  Materials  -  Honolulu,  Hawaii,  July  1 0  -  1 5,  1 999. 

2.  R.  Hecht-Nielsen,  1990.  Neurocomputing.  Addison- Wesley. 

3.  S.  Haykin,  1994.  Neural  Networks  -  A  comprehensive  foundation.  Prentice  Hall 

4.  D.E.  Goldberg,  1997.  Genetic  Algorithms  in  Search,  Optimisation  &  Machine  Learning.  Addison 
Wesley. 

5.  J.C.  Bean,  1994.  Genetic  Algorithms  and  Random  Keys  for  Sequencing  and  Optimisation.  ORSA 
Journal  of  Computing,  6(2),  154-160 

6.  L.M.  Dragicevic  and  M.  M.  Rsumovic,  1987.  Prognosis  of  characteristics  of  multicomponent  materials  on 
the  example  of  flexural  strength  of  Portland  cement.  Cement  and  Concrete  Research,  1 7,  .47-54. 

7.  BIOCOMP,  1997.  NeuroGenetic  Optimizer.  Biocomp  Systems  Inc. 

8.  M.  Hagan ,  H.  Demuth  and  M.  Beale,  1996.  Neural  Network  Design.  PWS  Publishing  Company. 

9.  MATLAB,  1992.  MATLAB  High  Performance  Numeric  Computation  Software.  The  Mathworks. 

10.  MATLAB,  1993.  MATLAB  Neural  Network  Toolbox.  The  Mathworks. 

1 1.  NEUROSHELL,  1996.  NeuroShell2.  Ward  System  Group  Inc. 

12.  C.E.S.  Tango,  1991.  Urn  estudo  do  desenvolvimento  da  resistencia  a  compressao  do  concreto  de  cimento 
Portland  ate  50  anos.  Boletim  160  -  IPT. 

13.  C.E.S.  Tango,  et  al.,  1995.  Planilha  eletronica  para  previsao  da  resistencia  do  concreto.  Anais  do  37° 
REIBRAC  -  IBRACOM,  785-797). 


299 


Rough  Sets-based  Machine  Learning 
Using  a  Binary  Discernibility  Matrix 

Reynaldo  Felix,  Toshimitsu  Ushio 

Department  of  Systems  and  Human  Science 
Graduate  School  of  Science  and  Engineering 
Osaka  University,  Japan 


ABSTRACT 

This  paper  presents  an  approach  with  two  methods  to  obtain  minimal  coverings  in  Rough  Sets  based 
Machine  Learning,  both  methods  are  based  on  the  definition  of  a  binary  discernibility  matrix.  The  first 
method  is  an  exhaustive  search  of  coverings  and  the  second  uses  a  genetic  algorithm  (GA)  based  search. 
The  approach  represents  the  discernibility  of  two  examples  by  a  condition  attribute  of  an  information 
system  in  a  single  bit.  Thus,  operations  that  usually  are  performed  with  a  set  approach  are  redefined  in 
order  to  use  bit-wise  logical  operations.  The  algorithms  for  both  methods  are  presented  and  discussed. 


INTRODUCTION 

Rough  sets  theory  is  a  mathematical  approach  to  imprecision,  vagueness  and  uncertainty  in  data  analysis 
[1].  The  starting  point  of  rough  sets  is  the  assumption  that  with  every  object  of  interest  we  associate  some 
information,  also  called  knowledge.  With  this  knowledge  we  can  perform  classifications  and  this  is  the  key 
issue  in  reasoning,  learning  and  decision  making.  Such  classifications  can  be  done  not  only  in  objects,  but 
also  in  abstract  ideas  or  concepts,  processes,  moments  of  time,  states,  etc. 

The  knowledge  can  be  irrelevant,  redundant,  uncertain,  imprecise  or  incomplete  and  usually  is  obtained 
from  experience  or  from  querying  an  expert  about  what  action  to  take  depending  on  different  conditions 
This  knowledge  can  be  represented  in  an  information  system  as  examples  described  for  condition  attributes 
and  decision  attributes.  Then  rough  sets  theory  is  used  to:  remove  superfluous  data  through  measurement 
of  dependencies  between  attributes  and  deal  with  inconsistencies  through  the  definition  of  lower  and  upper 
approximation  [1],  The  depurate  and  consistent  data  is  used  to  define  sets  of  decision  rules  (i.e.  decision 
table),  where  each  rule  is  also  described  for  condition  and  decision  attributes. 

The  characteristics  described  above,  support  the  applicability  of  rough  sets  theory  in  machine  learning 
considering  that  data  in  real  applications  is  often  vague  and  has  many  inconsistencies.  The  use  of  the 
theory  in  methods  of  learning  from  examples  and  inductive  learning  was  first  proposed  in  [1]  and 
developed  in  several  works  as  pointed  in  [4],  Nowadays  there  are  many  approaches  to  machine  learning 
based  on  the  rough  sets.  The  theory  has  been  used  at  different  stages  of  the  process  of  rule  induction  and 
data  preprocessing.  This  work  is  focused  in  the  learning  from  examples  approach.  We  first  use  the  theory 
for  data  preprocessing  to  deal  with  inconsistencies  in  data  and  remove  superfluous  attributes  and  then  in 
the  rule  induction  stage  to  obtain  the  coverings  for  the  basic  concepts. 

The  problem  in  finding  minimal  coverings  is  proven  to  be  NP-hard,  so  the  importance  of  developing  more 
efficient  computational  methods  for  rough  sets  applications  is  discussed  in  several  works  [2][3][8],  In  this 
paper,  the  proposed  approach  to  find  minimal  coverings  is  based  on  a  binary  representation  of  discernibility 
between  pairs  of  examples  in  a  table,  called  the  “ binary  discernibility  matrix"  and  a  binary  representation 
of  sets  of  condition  attributes,  thus  implicit  parallel  processing  up  to  64  bits  (condition  attributes)  is  used. 

The  binaiy  discernibility  matrix  enables  us  to  detect  inconsistent  examples,  irrelevant  attributes  and 
indispensable  attributes;  to  measure  the  capability  of  an  attribute  to  classify  objects  and  mainly  to  find  the 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


300 


minimal  coverings.  In  this  representation,  the  coverings  finding  is  done  by  testing  which  sets  of  condition 
attributes  can  discern  all  the  pairs  of  examples  in  the  table.  Two  methods  for  this  search  are  proposed,  the 
first  is  an  exhaustive  search  while  the  second  uses  a  genetic  algorithm  (GA)  based  search. 


PRELIMINARIES  AND  NOTATIONS 

We  consider  an  information  system  with  a  finite  non-empty  set  (the  universe)  of  objects  that  we  are 
interested  in,  denoted  by  U={ei,e2,..,em},  each  object  e(  of  U  is  called  an  example  or  an  instance.  Those 
objects  are  described  by  a  finite  non-empty  set  of  condition  attributes  A={ai,a2,..,an},  and  a  finite  non¬ 
empty  set  of  decision  attributes  C={ci,c2,..,cp}.  Thus  we  can  say  that  by  using  the  description  in  A,  the 
examples  in  U  can  be  classified  in  concepts  or  categories  ofC.  We  define  B  as  a  subset  of  the  whole  set  of 
condition  attributes  A,  and  Bk  as  a  subset  composed  just  for  the  condition  attribute  al;. 

The  decision  attributes  define  the  categories  or  equivalence  classes  [1],  representing  the  family  of  concepts 
in  the  information  system  (see  Table  1).  A  concept  or  equivalence  class  will  contain  all  the  examples  with 
the  same  condition  attributes  values,  thus  [ei]c  is  the  concept  c  to  which  the  example  ej  belong.  In  its 
simplest  form  a  decision  attribute  can  take  only  two  values:  belong  to  the  concept  or  not  (yes,  no),  thus  it 
represents  a  basic  concept.  The  set  of  examples  of  U  that  has  the  decision  attribute  “yes”  is  called  the  set  of 
examples  positives.  Nevertheless,  even  if  most  of  the  cases  include  multi-valued  decision  attributes  that 
define  several  basic  concepts,  in  our  approach  we  will  split  any  multi-valued  decision  attribute  into  several 
binary  decision  attribute  in  order  to  define  all  the  basic  concepts. 


Table  1.  Information  system:  Computer  buying  criteria 


Condition  attributes 

Decision  attributes 

Examples 

Speed 

(MHz) 

Price 

Software 

installed 

Memory 

(MB) 

Basic  Concepts  for  Buy 

Yes4  No4  Not 

dec.* 

Concept  Yes 
approximations 

Lower’  Upper" 

Concept  No 
approximations 

Lower'  Upper* 

ei 

200-300 

Medium 

Yes 

128 

Yes 

Yes 

No 

No 

Yes 

Yes 

No 

No 

e2 

200-300 

Low 

Yes 

64 

No 

No 

Yes 

No 

No 

No 

Yes 

Yes 

e3 

& 

o 

o 

Low 

Yes 

80 

No 

No 

Yes 

No 

No 

No 

Yes 

Yes 

e4 

300- 

High 

Yes 

64 

Yes 

Yes 

No 

No 

No 

Yes 

No 

Yes 

e5 

300- 

Medium 

Yes 

64 

Not  decided 

No 

No 

Yes 

No 

No 

No 

No 

e6 

300- 

High 

Yes 

64 

No 

Yes 

No 

No 

Yes 

No 

Yes 

e? 

200-300 

High 

No 

80 

No 

Yes 

No 

No 

No 

No 

e8 

-200 

Low 

No 

128 

No 

No 

Yes 

No 

No 

No 

Yes 

Yes 

Note:  examples  4  and  6  are  inconsistent.  Tnconsistent  basic  concepts  ‘Consistent  basic  concepts. 


Inconsistent  representation  of  examples  is  a  kind  of  uncertainty  frequently  found  in  knowledge  bases. 
When  inconsistent  examples  exist  (i.e.  examples  with  the  same  condition  attributes  but  belonging  to 
different  concepts),  we  use  a  lower  and  upper  approximation,  as  defined  in  [1]  to  obtain  concepts  defined 
by  consistent  examples  (i.e.  examples  with  the  same  condition  attributes  and  belonging  to  the  same 
concept).  In  general,  the  lower  approximation  includes  all  objects  that  can  be  certainly  classified  as  while 
the  upper  approximation  includes  all  those  that  can  be  probably  classified  as  elements  of  a  concept. 

Let  Ec  ={e,,  e2,..eq}  be  a  subset  including  all  examples  of  U  belonging  to  the  concept  C,  and  [e]c  is  the 
concept  to  which  e  belongs,  then  the  lower  approximation  and  the  upper  approximation  ^p  t  are: 

CEc={ee  U:[e]ccEc)  1. 

CEc=  {e  e  U  :  [e]cn  Ec*  <)>}  2. 

When  the  concept  has  inconsistent  examples,  we  obtain  the  lower  approximation  and  the  upper 
approximation  which  correspond  to  the  sets  of  certain  rules  and  possible  rules,  respectively  [8], 


301 


Knowledge  bases  sometimes  contain  redundant  or  irrelevant  information.  So,  to  simplify  data  without 
losing  information,  it  is  necessary  to  reduce  the  knowledge  base  by  obtaining  its  reducts. 

Pawlak  [1]  defines  a  redact  (also  called  covering)  as  “the  minimal  subset  of  attributes  that  discerns  all 
objects  discernible  by  the  whole  set  of  attributes”.  In  this  definition,  the  whole  set  of  attributes  includes 
condition  attributes  and  decision  attributes.  When  we  consider  only  condition  attributes,  this  concept  turns 
into  that  of  relative  reduct.  A  relative  reduct  is  the  minimal  subset  of  condition  attributes  that  can  discern 
all  the  examples  belonging  to  different  concepts  (positive  and  negative  examples  for  a  basic  concept), 
examples  belonging  to  the  same  concept  are  not  considered  because  they  do  not  need  to  be  discerned  .  It  is 
noted  that  all  the  supersets  of  a  reduct  are  also  a  reduct  [1  ].  We  are  mainly  interested  in  the  minimal  reducts 
(i.e.  coverings  containing  as  few  attributes  as  possible).  We  will  refer  to  a  (minimal)  relative  reduct  as  a 
(minimal)  covering,  a  condition  attribute  as  an  attribute,  and  a  decision  attribute  as  a  concept. 


BINARY  REPRESENTATION 

The  problem  of  finding  minimal  coverings  has  been  proved  to  be  NP  hard,  thus  the  importance  of 
developing  more  efficient  computational  methods  for  rough  sets  based  applications  has  been  discussed  in 
several  works  [2][3][4].  In  our  approach,  we  take  advantage  of  bit-wise  operations  that  are  performed  in  a 
computer  in  parallel  for  groups  of  8,  16,  32  or  64  bits.  For  bigger  groups,  we  divide  the  attributes  into 
groups  and  process  them  sequentially.  A  binary  representation  of  the  information  system  and  a  set  of 
condition  attributes  BcAis  defined  below. 


Binary  Discernibility  Matrix 

The  proposed  method  is  based  on  the  construction  of  a  binary  table  to  represent  discernibility  between 
pairs  of  examples,  called  the  binary  discernibility  matrix. 


Table  2.  a)  Binary  discernibility  matrix  for  the 
information  system  in  Table  1 . 


a) 

Attributes 

Pairs 

ai 

* 

a4 

e,e2 

0 

1 

1 

e(e3 

1 

1 

1 

e,e4 

I 

1 

0 

1 

C|  e5 

I 

0 

1 

eie6 

1 

i 

K 

e,e7 

i 

i 

1 

ei  e8 

i 

i 

i 

e2e3 

i 

IKHi 

kh 

i 

e2e4 

i 

i 

0 

e2e5 

i 

i 

b^b 

0 

e2e6 

i 

i 

bh 

1 

e2e7 

HUBS 

i 

i 

1 

e2e8 

i 

0 

i 

1 

e3  e4 

i 

1 

0 

I 

e3  e5 

i 

1 

0 

I 

e3  e6 

i 

I 

B3B 

I 

e3  e7 

i 

1 

i 

e3  e8 

hob 

i 

■ 

e4es 

u 

> 

e4e6 

HDH 

i 

e4e7 

i 

i 

i 

e4e8 

i 

1 

i 

i 

e5  e6 

1 

i 

e5  e7 

i 

1 

i 

i 

e5  e8 

i 

1 

i 

i 

e6  e7 

i 

i 

i 

e6  e8 

i 

1 

i 

e7  e8 

i 

1 

> 

b)  Pairs  considered  in  each  basic  concept 
for  analysis. _ 

b)  |  Basic  concepts 


Pending 

L-Acccpt 

U-Accept 

L-Reject 

U-Reject 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+  denotes  pairs  of  examples  involved  in  the  basic  concept,  i.e.  those  pairs  where  one  example  belongs  to  the  concept  but  not  the  other. 


302 


The  discemibility  dBk(ej,ej)  between  two  examples  ej  and  ej  by  using  a  single  condition  attributes  Bk,  V  i  ^  j 
and  [ei]c  *  [ej]c  can  be  given  in  a  binary  representation  (i.e.  { 1,0}  =  {yes,  no}),  such  that 

,  ,  Jl  if  ej  is  discernible  of  ej  by  Bk  3. 

dBk(ej,ej)  —  / 

1^0  otherwise 

Such  binary  representation  of  discemibility  let  us  determine  if  a  pair  of  examples  is  discernible  by  B  with  a 
single  bit.  In  the  binary  discemibility  matrix,  columns  are  single  condition  attributes  and  rows  are  example 
pairs  belonging  to  different  concepts. 

The  number  of  condition  attributes  determine  the  length  of  the  binary  string,  the  number  of  rows  in  the 
matrix  for  an  information  system  with  me  examples  is 

BDMrows  =  me  (me- 1  )/2  4. 

and  the  number  of  rows  of  the  matrix  involved  in  a  basic  concept  which  has  ne  examples  is 


BDMci  =  ne  ( ne-me )  5. 

The  rows  involved  in  a  basic  concept  form  a  partial  discemibility  matrix  for  the  concept.  This  matrix  has 
the  following  main  patterns: 

If  a  row  is  all  0’s,  it  means  that  the  corresponding  pair  of  examples  are  indiscernible  even  when  using 
the  whole  set  of  condition  attributes,  then  the  examples  are  inconsistent  because  they  belong  to 
different  concepts  while  having  the  same  condition  attribute  values.  In  that  case,  the  lower  and  upper 
approximations  for  the  concept  should  be  used  in  order  to  deal  with  such  inconsistencies. 

If  a  column  is  all  l’s,  then  the  attribute  is  capable  of  distinguishing  all  the  pairs  of  examples  that  are 
discernible  by  the  whole  set  of  attributes,  and  that  is  the  definition  of  a  covering.  In  such  a  case,  a 
covering  was  found,  and  since  it  has  only  one  attribute,  we  can  say  that  it  is  a  minimal  covering. 

If  a  column  is  all  0’s,  then  the  attribute  is  completely  irrelevant  to  the  concept  because  it  is  unable  to 
distinguish  any  pair  of  examples  by  itself. 

If  a  row  in  the  matrix  has  only  one  “1”  in  the  bit  string,  then  the  corresponding  attribute  is  the  only  one 
able  to  distinguish  that  pair  of  examples  and  so  it  is  indispensable  to  the  concept. 

The  rate  of  ones  in  a  column  over  the  number  of  pairs  in  the  discemibility  matrix  is  a  measure  of  the 
approximation  of  that  attribute  (or  combination  of  attributes)  to  a  covering,  it  is  called  the 
discemibility  degree  and  is  a  part  of  the  fitness  function  in  the  GA-based  search  method  proposed  later. 


Binary  representation  of  sets 

We  define  a  bit  string  with  length  n  (the  number  of  condition  attributes),  where  each  bit  corresponds  to  a 
condition  attribute.  Figure  1  shows  conventional  notation  for  bits  and  attributes,  where  the  LSB  hi 
corresponds  to  the  first  attribute. 


Fig.  1.  Location  of  the  condition  attributes  in  a  binary  string. 


Given  a  set  B  c  A,  each  bit  of  the  bit  string  is  set  to  “1”  if  the  corresponding  attribute  is  a  member  of  the 
combination;  otherwise  it  is  set  to  “0”.  For  simplicity,  we  use  the  same  symbol  for  both  a  set  and  its 
representation  in  a  binary  string. 


Bit-wise  sets  operations 

It  is  well  known  that  a  computer  processes  in  parallel  bit  strings  in  standard  lengths  (e.g.  8,  16,  32  or  64 
bits)  and  also  the  function  of  the  Boolean  operations.  Thus  given  the  sets  B,  DcA  represented  in  bit 
strings  as  shown  in  Fig.  1,  we  use  the  following  operations: 

-  AND,  B  &  D.  The  result  preserves  only  those  attributes  that  are  members  of  both  sets.  In  a  sets  point  of 
view  this  operation  is  equivalent  to  B  n  D. 


303 


-  INCLUSIVE  OR,  B  v  D.  The  result  preserves  those  attributes  that  are  members  of  any  of  the  sets,  i.e., 

BuD. 

-  EXCLUSIVE  OR,  B  v  D.  The  result  preserves  those  attributes  that  are  members  of  one  set  but  not  of  the 
other,  i.e.  (BuD)-(Bn  D). 

-  NOT,  !B.  This  operand  involves  only  one  set,  the  result  will  have  as  members  only  those  attributes  that 
are  not  members  in  the  set  before  the  operation,  then  Bu!B  =  Aor!B  =  A-B. 

Now  we  will  use  these  operations  to  make  sets  operations  such  as  equality  and  inclusion  of  sets  that  are 
used  in  rough  sets  analysis  to  eliminate  superfluous  attributes  and  search  for  coverings. 

-  Equality  of  sets,  even  when  this  operation  does  not  require  a  logical  operation,  we  can  outline  that 

BvD  =<j>  ifB  =  D  6 

-  Inclusion  of  sets.  The  test  is  performed  as  follows:  B  v  D  will  show  us  those  attributes  in  which  the  sets 

differ,  ifB  c  D  then  only  those  attributes  members  of  D  but  no-members  of  B  will  be  included  in  the 
result;  then  D  v  B  v  D  will  preserve  members  of  both,  the  original  D  and  the  previous  operation,  thus  if 

BcD  only  the  members  of  D  will  remain  in  the  result. 

B  v(BvD)=  D  ifBcD  7 

-  Add  elements  to  the  set  B.  Given  D  a  set  whose  members  are  those  elements  that  will  be  added  to  the  set 
B.  B’  will  include  elements  in  set  B  plus  the  new  elements. 

B’=BvD  8. 

-  Remove  elements  from  the  set  B.  In  a  similar  way  the  elements  of  D  will  be  removed  from  B  by  using: 

B”  =  B  &  (!D)  9. 

-  Calculate  discemibility  degree  by  B.  Lets  suppose  a  pair  of  examples  i  j  where  the  discemibility  for  A  is 
described  in  Py  (a  row  in  the  binary  discemibility  matrix)  and  a  set  of  attributes  B  c  A: 

Py  is  discernible  by  B  if  B  &  Py  *  <]>  10. 

Now  if  we  apply  this  operation  with  all  the  pairs  of  examples  included  in  the  concept  (denoted  Pc),  we 
can  define  the  discemibility  degree  as  the  number  of  pairs  of  examples  in  Pc  that  the  set  B  can 
distinguish,  and  is  denoted  as  ddB. 

-  Find  dispensable  and  indispensable  attributes.  Given  Pc  (all  the  pairs  involved  in  the  concept  C)  and  a 
set  Bk  c  B,  such  that  V  Py  e  Pc,  (Py  &  B)  £  0,  we  have 

-  if  V  Py  e  Pc,  Py  &  (B  &  !Bk)  &  <j)  then  ak  is  dispensable  in  B  to  discern  pairs  in  concept  C.  11. 

-  if  V  Py  g  Pc,  Py  &  (B  &  !Bk)  =  <])  then  ak  is  indispensable  in  B  to  discern  pairs  in  concept  C.  12. 

-  Find  coverings  of  the  concept.  Given  a  B  cz  A,  B  is  a  covering  if 

VPyEPc,  (Py&B)*«)  13. 

and  is  considered  a  minimal  covering  if  all  D  c  B  are  not  coverings. 


MINIMAL  COVERINGS  FINDING  ALGORITHMS 

Exhaustive  search  method 

This  method  is  a  direct  search  starting  from  a  single  attribute  and  afterwards  using  combinations  of  two 
attributes,  three  and  so  on.  Analysis  of  a  given  combination  of  attributes  B  is  performed  until  the 
combination  cannot  distinguish  a  pair  of  examples  (i.e.  until  a  zero  results  from  ANDing  B  and  the  rows  in 


304 


the  binary  discemibility  matrix).  If  all  pairs  are  analyzed  without  obtaining  zeros,  then  B  is  a  covering.  As 
the  search  begins  from  the  simplest  combinations,  we  know  B  is  a  minimal  covering. 

When  a  covering  is  found,  all  the  supersets  should  be  eliminated  from  the  search  space.  Hence  for  each 
succeeding  combination,  we  test  if  the  covering  is  its  subset.  If  so,  the  combination  is  not  a  minimal 
covering.  The  search  finishes  when  all  possible  combinations  are  analyzed.  Our  problem  is  not  reduced  to 
find  only  one  covering  but  all  minimal  coverings,  or  at  least  a  set  of  best  coverings.  So  we  use  a  covering 
pool  to  save  all  the  found  coverings,  and  we  use  it  also  as  a  reference  to  skip  analyzing  supersets  of 
existing  coverings  in  the  pool. 

It  is  important  to  outline  here  that  when  inconsistent  examples  are  found  in  a  concept  C  and  the  lower  and 
upper  approximations  were  generated  (i.e.  q  and  q  ),  given  the  characteristics  of  these  approximations,  it 

is  possible  to  search  coverings  for  only  the  upper  approximation  of  the  basic  concept  Then,  in  order  to 
obtain  the  minimal  coverings  of  the  lower  approximation,  it  is  sufficient  to  test  which  elements  of  the 
covering  pool  for  the  upper  concept  are  also  coverings  of  the  lower  concept. 

GA  based  method 

The  main  stages  of  a  GA-based  method  are  described  below  and  some  important  considerations  for  its 
implementation  are  mentioned  [5]. 

In  a  GA,  a  population  of  individuals  is  used,  each  represents  in  a  binary  string  a  combination  of  attributes 
(B  c  A)  which  is  considered  a  possible  covering,  then  we  evaluate  each  individual  to  determine  if  in  fact  it 
is  a  covering  or,  if  not,  how  near  it  is  from  being  one.  Individuals  are  then  selected  by  a  polarized  random 
process  and  recombined  to  produce  new  individuals  which  are  expected  to  be  better  approximations,  the 
process  finishes  when  the  stop  criteria  is  satisfied,  therefore  the  coverings  finding  process  is  expected  to  be 
faster  than  the  exhaustive  search. 

Representation.  Individuals  in  the  GA  population  will  be  sets  of  attributes  as  presented  before  (see  Fig.  1), 
thus  the  length  of  individuals  depends  on  the  number  of  attributes.  In  this  case  it  is  also  recommendable  to 
have  a  covering  pool  besides  the  population  and  to  use  results  stored  in  the  pool  instead  of  using  the 
individuals  of  the  last  generation. 

GA  uses  the  first  generation  produced  randomly,  but  a  first  generation  composed  of  sets  with  just  one 
attribute  seems  a  better  option  in  our  problem,  thus  the  first  generation  will  analyze  all  single  attribute 
combinations,  this  fact  would  save  computational  time.  This  single  attribute  sets  evaluation  will  also  assure 
a  well  oriented  search  of  more  complex  combinations. 

Evaluation.  This  stage  is  very  important  because  a  good  evaluation  of  individuals  will  lead  the  search  to 
good  results.  The  evaluation  of  individuals  is  done  through  a  fitness  function  and  would  be  quite  time 
consuming.  We  evaluate  the  fitness  of  a  set  B  using  the  following  fitness  function,  where  ddB  is  the  degree 
of  discemibility  of  B,  BDMCi  is  the  numbers  of  pairs  involved  in  the  concept,  n  is  the  number  of  attributes 
and  length  is  the  set  length  measured  as  the  number  of  members  (ones)  in  the  set  (bit  string). 

^vB)_  ddB  !  n -length  14 

Jy  ’  BDMC.  n-1 

Selection.  Once  the  population  is  evaluated,  a  polarized  random  selection  should  be  done.  We  implemented 
two  random  sampling  methods  with  similar  results  (i.e.  roulette  wheel  method  and  SUS  method  [6]). 

Crossover.  After  selecting  the  parents  of  the  next  generation,  it  is  necessary  to  recombine  them  into  new 
individuals.  There  are  several  methods  proposed  [5][7],  the  procedures  are  relatively  easy  and  it  is  strongly 
recommended  to  implement  some  of  them  and  evaluate  their  performance  in  a  given  problem,  then  we  can 
use  the  best  for  a  specific  type  of  problem. 


305 


Mutation.  An  additional  but  important  stage  in  the  GA  is  the  mutation  of  individuals  which  would  be  done 
at  very  low  rates.  We  mutate  a  random  individual  complementing  a  random  bit. 

Stop  criteria.  The  desired  stop  criteria  to  be  used  is  when  all  minimal  coverings  were  founded,  it  avoids 
waste  of  time  or  obtaining  a  sub-optimal  solution,  nevertheless  in  our  case  this  criteria  is  difficult  to 
implement  and  we  do  not  have  a  method  to  prove  it  yet.  Besides  the  traditional  stop  criteria  at  the  wglh 
generation  we  stop  when  nc  coverings  were  found,  this  criteria  lets  us  establish  a  size  for  the  covering  pool. 
Nevertheless,  depending  on  the  information  system  under  analysis  it  may  give  non-minima!  coverings. 
Based  on  the  measures  given  in  rough  sets  theory,  we  can  use  the  accuracy  of  approximation  or  the  quality 
of  approximation  as  stop  criteria  [1],  guaranteeing  a  minimal  performance  of  classifications. 

We  have  implemented  this  method  for  coverings  finding  but  it  has  not  been  completely  tested  or  evaluated. 


CONCLUSIONS 

The  use  of  a  matrix  containing  a  binary  representation  for  discemibility  has  been  proposed  and  its 
characteristics  and  properties  discussed.  Some  operations  oriented  to  sets  analysis  in  this  representation  are 
also  presented. 

This  representation  takes  advantage  of  the  implicit  parallel  processing  over  groups  up  to  64  bits  to  analyze 
groups  of  attributes  at  one  time.  A  drawback  of  the  method  is  the  matrix  size,  then  the  approach  is 
advantageous  when  solving  problems  with  many  attributes  but  relatively  few  examples. 

We  have  tested  and  implemented  the  data  preprocessing  and  covering  finding  stages,  but  the  validation  and 
comparison  of  this  approach  with  traditional  methods  of  Machine  Learning  is  left  for  future  work.  Another 
important  task  is  its  implementation  in  an  inductive  learning  approach  which  will  enable  us  to  update  the 
rule  set  adding  new  knowledge. 


REFERENCES 

1.  Z.  Pawlak,  1989.  Rough  sets,  theoretical  aspects  of  reasoning  about  data.  Kluwer  Academic 
Publishers,  1991. 

2.  J.  Wroblewsky,  1998.  GA  in  decomposition  and  classification  problems.  In  “Rough  sets  in  knowledge 
discovery,  v.2”,  L.  Polkowski  and  A.  Skowron.  Physica-Verlag.  471-487. 

3.  J.W.  Guan  and  D.A.  Bell,  1998.  Rough  computational  methods  for  information  systems.  Artificial 
Intelligence  105,  77-103. 

4.  J.W.  Grzymala-Busse,  J.  Stefanowsky,  W.  Ziarko,  1996.  Rough  sets:  facts  versus  misconceptions. 
Informatica  20, 455-464. 

5.  David  E.  Goldberg.  Genetic  algorithms  in  search,  optimization  and  machine  learning.  Addison-Wesley 
Publishing  Co.  Inc. 

6.  Peter  J.B.  Hancock,  1995.  Selection  methods  for  Evolutionary  Algorithms.  In  “Practical  handbook  of 
genetic  algorithms,  Volume  II,  New  Frontiers”.  CRC  Press.  67-92. 

7.  M.A.  Pawlowsky,  1995.  Crossover  operators.  In  “Practical  handbook  of  GA,  Volume  1,  Applications”. 
CRC  Press.  101-141. 

8.  C.C.Chan,  J.W.  Grzymala-Busse,  1998.  On  the  lower  boundaries  in  learning  rules  from  examples.  In 
“Incomplete  information:  rough  sets  analysis”.  Physica-Verlag.  58-74. 


306 


307 


Intelligence  in  the  Design  of  Materials  and  Processes  I 


308 


309 


Intelligold  -  an  Expert  System  for  Gold  Plant  Process  Design 

Vanessa  M.  Torres*  ,  Arthur  P.  Chaves**,  John  A.  Meech*** 

*  Companhia  Vale  do  Rio  Doce,  Belo  Horizonte,  MG,  Brazil 
**  Escola  Politecnica  da  Universidade  de  Sao  Paulo,  Sao  Paulo,  Brazil 
***  University  of  British  Columbia,  Vancouver,  BC,  Canada 


ABSTRACT 

Gold  mining  projects  are  a  rare  opportunity  in  the  minerals  industry.  They  require  relatively  small  capital 
and  give  high  profitability  and  fast  return  on  investment  compared  with  other  mineral  projects.  To  expand 
or  maintain  gold  production,  continuous  development  of  new  deposits  and  fast  implementation  of  new 
mining  sites  are  needed.  Process  design  is  one  of  the  major  issues.  As  simple  and  easily  extractable  ores  are 
almost  all  exhausted,  there  is  a  need  for  a  consistent  approach  to  deal  with  increasing  complexity  and 
decreasing  or  stagnant  gold  prices.  Process  design  must  consider  ore  genesis,  mineralogical  characteristics, 
ore  behaviour  in  available  metallurgical  processes,  linkage  with  the  mining  method,  environmental  impact, 
and  economic  issues.  The  type  of  work  and  environment  involved  makes  this  application  ideal  for  using  AI 
tools  such  as  Expert  Systems,  Fuzzy  Logic  and  Neural  Networks. 

This  paper  presents  Intelligold,  an  expert  system  for  project  development  teams  to  use  at  the  preliminary 
evaluation  and  conceptual  project  stages.  Information  and  knowledge  from  geology/mineralogy,  processing 
and  economics  are  organized,  and  recommendations  on  process  options  and  estimated  costs  and  revenue 
are  given.  The  "knowledge  building"  method  is  described,  together  with  implementation  and  verification. 
Success  in  building  this  system  suggests  application  to  other  ores  such  as  copper  and  complex  base  metals. 


INTRODUCTION 

Evolution  of  a  gold  project  is  a  dynamic  activity  [1].  To  expand  or  maintain  production,  continued 
discovery  of  deposits  and  fast  implementation  of  new  mines  are  needed.  After  the  rapid  growth  in  gold 
production  in  the  80's,  following  significant  price  increases  in  the  late  70’s,  the  industry  is  now  faced  with 
low  grade  and/or  complex  ores  [2],  We  must  optimise  development  from  geological  exploration  through  to 
production  and  commercialisation  to  reduce  risk  in  making  a  poor  decision.  From  the  time  of  discovery 
until  the  first  bar  is  poured,  careful  work  and  thought  are  needed.  Input  from  the  geology,  engineering, 
architecture,  economics,  sociology  and  biology  are  needed  over  many  years  until  start-up  can  occur. 

There  is  a  need  for  consistency  to  deal  with  the  paradox  of  making  a  profit  despite  increased  complexity 
and  decreased  or  stagnant  gold  prices.  Plant  design  must  consider  ore  genesis,  mineralogical  characteristics, 
ore  behaviour  in  available  metallurgical  processes,  mining  method,  environmental  impact,  and  economics. 
Uncertainty  makes  this  environment  ideal  for  Expert  Systems,  Fuzzy  Logic  and  Neural  Networks  [3]. 


PROJECT  DEVELOPMENT:  IMPRECISION,  RISK  AND  DECISION-MAKING 

Success  of  a  new  mine  depends  on  many  factors.  The  answer  to  a  simple  question  is  key,  "is  it  a  'good 
project'?"  Unfortunately,  the  answer  is  far  from  simple,  especially  at  the  preliminary  stages.  The  search  for 
an  answer  is  termed  a  "feasibility  study":  an  interactive  process  to  gather  and  evaluate  information. 
Development  is  performed  in  gradual  stages  as  follows: 

geological  assessment 
mineralogical  assessment 
ore  behaviour  assessment 
process  and  flowsheet  selection 
equipment  selection  and  sizing 
economic  analysis 


0-7803-5489-3/99/$  10.00  ©1999  IEEE 


310 


In  the  preliminary  stages,  information  may  be  sufficient  to  classify  the  project  as  a  bad  one,  but  it  is  almost 
never  extensive  enough  to  ensure  the  project  will  be  profitable.  The  boundary  between  a  poor  prospect  and 
a  good  one  is  fuzzy:  the  prospect  can  be  clearly  poor  (e.g.  there  is  no  gold  in  the  ore,  grade  is  too  low,  etc), 
possibly  poor,  possibly  good  and  clearly  good.  Figure  1  illustrates  these  fuzzy  concepts. 


Fig.  1.  Uncertainty  of  project  development. 


Technical  feasibility  with  maximum  precision  is  only  possible  when  detailed  engineering  is  complete. 
From  the  early  stages  of  research,  investment  is  needed  and  decisions  are  made  to  move  on  to  the  next 
development  stage.  Table  1  illustrates  the  characteristics  of  the  development  stages  of  a  gold  project. 


Table  1.  Develoi 

pment  stages  of  a  mineral  project. 

Phase  of  development 

Accumulated  cost 

(%  total  capital  cost) 

Information  available 

Error  in  capital  cost 
estimation 

(estimated  -  actual) 

Preliminary  evaluation 

negligible 

very  uncertain, 

"order  of  magnitude"  evaluation 

from  +40  to  -20% 

Conceptual  project 

0,6  to  1% 

uncertain,  yet  sufficient  to  outline  the 
main  characteristics 

from  +20  to  -12% 

Basic  engineering 

-  10% 

reliable  geological  and  metallurgical  data, 
basic  layout  and  equipment  sizing 

from  +15  to -10% 

Detailed  engineering 

~  100% 

final  geological  and  metallurgical  reports, 
detailed  layout  and  equipment  specification 

from  +  7  to  -5% 

THE  USE  OF  DIAGNOSTIC  TOOLS  FOR  GOLD  PROCESS  PLANT  DESIGN 

The  need  to  evaluate  potential  gold  resources  with  increased  accuracy  in  a  timely  fashion  has  lead  to  the 
development  of  diagnostic  methods  at  many  laboratories  using  standard  procedures  for  cyanidation, 
flotation,  gravity  recovery  and  various  leaching  options  [4].  Because  of  these  efforts  together  with  the 
evolution  of  new  processes  for  refractory  ores,  gold  processing  can  be  considered  a  separate  subject  within 
the  field  of  mineral  processing.  Good  textbooks  are  available  [5,6]  and  many  papers  appear  each  year 
describing  technological  breakthroughs.  As  process  design  is  only  one  step  in  project  evaluation,  we  have 
integrated  relevant  testwork  with  data  from  geology,  mineralogy,  economics  and  environmental  issues.  Our 
system  is  intended  for  multidisciplinary  teams  from  the  very  beginnings  of  a  new  discovery. 

AN  EXPERT  SYSTEM  FOR  GOLD  ORES 

An  expert  system  for  gold  ores,  IntelliGoId,  has  been  developed  as  a  tool  for  project  development  teams  to 
use  at  both  the  preliminary  evaluation  and  conceptual  project  stages.  Data  and  knowledge  from  geology, 
mineralogy,  processing  and  economics  are  co-ordinated  in  the  analysis.  The  two  main  system  features  are: 

•  an  inference  system  able  to  suggest  processing  options  for  a  specific  ore  and  estimate  costs  and  revenue 
even  when  the  data  are  uncertain.  The  system  establishes  the  main  risk  factors  in  each  recommendation 
to  point  out  areas  for  additional  research.  By  using  this  tool,  a  development  team  can  be  directed 
towards  solutions  that  are  more  likely  to  increase  profitability  and  decrease  risk  of  failure; 


311 


•  a  hypertext  document  containing  state-of-the-art  knowledge  on  gold  processing  and  case  studies  for 
different  ores  and  existing  plant  flowsheets,  to  provide  easy  access  to  the  material  and  references. 

The  system  is  aimed  at  geologists,  research  engineers,  project  engineers  and  mineral  economists  involved 
in  gold  projects.  It  can  provide  feedback  to  each  area  individually,  and  assist  an  entire  team  when  working 
together  in  a  workshop.  It  can  help  evaluate  prospect  acquisition  or  joint  ventures.  Finally,  the  system  can 
help  train  new  professionals  to  the  field. 

From  ore  and  deposit  features  collected  during  initial  geological  work,  decision  rules  choose  processing 
options.  Unit  operations  are  assembled  and  sized.  With  process  routes  defined,  cost  and  revenue  are 
calculated  from  existing  models  and/or  historical  data.  Options  are  sorted  as  to  their  potential  return  and 
associated  risk.  The  hypertext  document  can  retrieve  information  on  existing  similar  mines.  The  user 
decides  which  process  to  investigate  further,  or  to  abandon,  hold  or  implement  the  project  (see  Figure  2). 


Ore/deposit 


Information 


* 

Decision  rules, 
fuzzy  sets 

t 

Process  routes 
Indication 


start/resec 


\ 


Cost  and  Revenue 
Calculations 


|  DECISION  | 
MAKING 


Continue  Investigations 

-  do  research 

-  gather  more  data 


Abandon 
or  Hold 


Ranking  of  alternatives 


Iternatives  NPV/IRR  Main  risk  factors 


Implement 

Project 


Fig.  2.  System  structure  as  perceived  by  the  user. 

We  chose  to  use  Comdale/X  as  the  development  tool  for  this  project.  This  software  tool  defines  variables  as 
"keyword  triplets",  characterised  by  3  elements:  object,  attribute  and  data  type.  Triplets  are  grouped  into 
classes,  which  are  organised  hierarchically.  Triplets  can  be  numeric,  string  or  fuzzy.  Fuzzy  triplets 
transform  a  numerical  measurement  into  a  facet  called  Degree  of  Belief  (DoB)  which  varies  between  0  (F) 
and  100  (T).  Rules  in  the  form  of  IF-AND-OR-THEN-ELSE  are  used  to  conduct  inferencing  [7], 

IntelliGold  has  been  developed  as  a  series  of  modules,  so  the  system  naturally  expands  as  new  modules  are 
developed  and  added.  This  approach  allows  future  application  to  deal  with  other  ores,  such  as  copper,  iron, 
lead/zinc  and  nickel. 

Even  if  data  are  missing,  the  system  can  inference  results  from  knowledge  derived  in  a  prior  stage.  As  data 
enters  the  system,  the  DoB  in  a  conclusion  improves  and  the  options  decrease  to  1  or  2  flowsheets. 

Knowledge  Building  Structure 

To  provide  a  flowsheet  based  on  ore  mineralogical  and  metallurgical  data,  the  system  links  information  and 
knowledge  in  a  way  similar  to  human-reasoning.  Information  is  represented  by  variables  and  knowledge  by 
rules.  The  system  contains  -1300  variables  and  600  rules.  Different  information  classes  are  identified: 

•  deposit  type  and  geology; 

•  mineralogy; 

•  metallurgical  behaviour; 

•  response  of  the  ore  for  various  processes; 

•  combination  of  processes  into  a  process  route. 


312 


Ultimately,  ore  behaviour  defines  the  process  to  be  used.  Ideally,  we  would  like  to  have  all  ore  processing 
information,  with  all  variables  and  scale-up  factors  pre-defined.  To  achieve  this  in  the  early  development 
stages,  we  must  infer,  approximate,  or,  even  guess,  the  process  from  geology,  mineralogy,  or  preliminary 
testwork.  The  knowledge  base  rules  that  suggest  unit  operations  and  select  process  routes  try  to  correlate 
aspects  of  geology,  mineralogy  and  ore  behaviour  with  the  many  processes  available  for  gold  ores. 

The  rules  that  link  several  aspects  in  each  class  are  structured  into  layers,  since  many  geological  premises 
can  indicate  mineralogical  characteristics,  which  in  turn  infers  process  behaviour,  which  then  defines  the 
processes  to  be  tested.  As  more  information  is  generated,  more  accurate  predictions  are  derived.  As  testing 
progresses  through  ore  behaviour  and  process  determination,  the  system  ultimately  determines  if  the 
prospect  is  poor  or  good  with  reasonable  belief.  Figure  3  depicts  the  knowledge-building  structure. 

Inference  and  Feedback 

As  can  be  seen  in  Figure  3,  each  variable  is  a  combination  of  inferencing  and  interpretation  of  experimental 
data.  The  system  works  from  geology  upward  to  process  route  selection  accumulating  and  weighting 
information  at  each  level.  The  weights  used  to  combine  inferred  and  measured  variables  are  derived  from 
the  conditions  under  which  the  tests  were  done  such  as  sample  quality.  The  following  situations  can  occur: 

•  There  is  only  an  inferred  certainty  for  a  variable:  in  this  case,  the  combined  belief  is  the  average  of 
the  inferred  value  (which  is  different  from  50)  and  default  value  (which  is  50),  and  so,  confidence  in 
the  variable  is  diminished  since  there  is  no  measure  to  verily  it; 

•  There  is  only  a  measured  certainty  for  a  variable:  in  the  same  way,  belief  is  diminished  because  of 
a  lack  of  reason  for  the  measurement.  However,  the  degree  of  amortisation  is  low  if  the  weight  of  the 
measurement  is  high  such  as  a  good  analysis  on  a  representative  sample; 

•  The  certainties  of  the  measured  and  inferred  variables  are  either  both  true  or  false:  combined 
belief  lies  between  the  measured  and  inferred  triplets,  depending  on  their  respective  weights; 

•  The  certainties  of  the  measured  and  inferred  variables  are  discordant:  -  combined  belief  tends 
toward  that  fact  with  the  higher  weight.  The  system  alerts  the  user  to  the  disagreement,  which  may  be 
due  to  poor  sampling,  analysis  error,  incorrect  grouping,  etc.,  or  an  unusual  deposit  or  ore. 

Management  of  incompatible  data  is  only  one  way  the  system  feeds  back  variables.  Feedback  also  comes 
from  the  economic  analysis  which  is  applied  after  the  flowsheet  design  stage. 

Geology  to  Mineralogy  Inference 

Mineralogical  inferencing  is  based  on  the  characteristics  of  most  common  gold  deposits.  The  classification 
of  Marsden  [5]  was  used  for  ore  types  while,  for  deposit  types  that  proposed  by  Paterson  [8]  was  chosen. 

Classification  of  deposit  types  is  driven  by  ore  genesis,  i.e.,  geological  issues.  On  the  other  hand,  ore  types 
are  classified  by  mixed  geological/mineralogical  characteristics  of  the  ore  and,  in  the  case  of  "free-milling" 
ores,  behaviour  during  cyanidation  is  determinant.  The  term  deposit  type  is  the  terminology  of  field 
geologists  while  ore  type  is  the  language  of  petrographic  geologists  or  mineralogists.  A  literature  review 
on  gold  deposits  and  projects  was  performed  to  derive  common  ore  characteristics  for  a  given  ore  type  or 
deposit  type,  resulting  in  typical  ores  for  each  classification.  Of  course,  there  are  deposit  types  in  which 
several  ore  types  may  occur.  In  these  cases,  the  system  gives  several  options,  but  with  low  belief.  Inferred 
mineralogy  is  combined  with  experimental  observations  using  the  weighting  method  described  above. 

Mineralogy  to  Behaviour  Inference 

The  inference  of  ore  behaviour  aims  to  correlate  mineralogical  variables  and  ore  metallurgy.  The  approach 
is  consistent  since  the  objective  of  all  processing  plants  is  to  modify  ore  mineral  properties  to  effect 
separation  and  then,  to  perform  selective  destruction  to  extract  valuable  elements  (gold).  The  system  uses 
rules  derived  from  interviews  with  experts,  the  authors'  experience  and  a  literature  review  to  instantiate 
inferred  behaviour  variables.  An  important  issue  is  metallurgical  testwork.  IntelliGold  interprets  numerical 
data  into  linguistic  expressions  that  characterize  process  behaviour.  The  system  uses  interpretation  rules  for 
laboratory/pilot  tests  on  major  unit  operations  such  as  comminution,  gravity  processing,  cyanidation, 
flotation,  diagnostic-leaching,  pre-oxidation  and  solution  purification/Au  recovery. 


313 


Fig.  3.  Knowledge-Building  Structure. 


Selection  of  Industrial  Processes 

Once  basic  ore  behaviour  is  established,  the  system  selects  the  industrial  processes  to  be  applied  to  the  ore. 
The  selection  of  industrial  processes  considers  metallurgical  behaviour  together  with  general  guidelines 
such  as  typical  gold  recovery,  throughput  and  head  grade  ranges. 

Selection  of  Process  Routes  and  Flowsheet  Design 

IntelliGold  continues  its  consultation  by  building  flowsheet  options.  To  accomplish  this,  the  system  uses  a 
combination  of  rules  and  default  values  to  select  unit  operations  from  primary  comminution  through  to  gold 
smelting.  The  system  defines  different  flowsheet  "options"  since  more  than  one  alternative  may  be  possible. 

It  is  important  to  enter  the  economic  evaluation  module  with  more  than  one  flowsheet,  since  final  choice 
depends  on  economic,  political  and  environmental  factors.  The  user  must  be  shown  more  than  one  feasible 
option,  since  site-specific  factors  unknown  to  the  system  may  add  more  certainty  to  one  specific  flowsheet. 

Metallurgical  Report  and  Flowsheet  Drawing 

The  final  step  in  the  process  selection  module  is  the  report  generation  and  flowsheet  drawing  procedure.  All 
input  data  and  results  generated  during  the  consultation  are  arranged  within  a  hypertext  report.  The  user  can 
browse  through  the  report,  go  to  a  specific  page  or  print  the  report.  Using  rules  and  graphical  files,  the 
system  is  able  to  draw  a  flowsheet  representing  each  possible  ore  process  selected  by  the  system.  The 
flowsheet  is  designed  as  a  series  of  block  diagrams,  each  of  which  represent  a  specific  unit  operation.  By 
clicking  on  a  unit  operation,  users  can  search  the  hypertext  document  for  details  on  each  unit  operation, 
equipment  sizes,  possible  problems,  and  a  picture  of  an  actual  unit.  Flowsheets  are  generated  as  in  Figure  4. 

Economic  Evaluation  of  Process  Routes 

Recommended  processes  are  input  to  the  economic  module.  The  module  calculates  capital  and  operating 
costs  for  each  unit  operation,  using  actual  costs  and  adjustment  factors  to  correct  for  inflation  (using  the 
M&S  index),  location,  tonnage  rate  or  plant  size,  availability  of  salvaged  equipment,  etc. 

The  system  determines  project  revenue  from  reserve  size,  estimated  throughput,  gold  grade  and  recovery. 
Net  Present  Value  and  Internal  Rate  of  Return  are  generated  from  the  cost  and  revenue  data.  The  main 
source  of  uncertainty  and  project  risk  are  identified  and  the  user  is  shown  which  testwork  parameters  must 
be  confirmed,  optimised  or  reviewed.  This  provides  feedback  on  process  behaviour  and  flowsheet  design. 


314 


Management  of  Contradictions 

An  important  feature  involves  management  of  contradictions.  A  contradiction  is  found  when  inferred  and 
measured  conclusions  are  discordant,  or,  using  a  fuzzy  logic  definition,  when  the  difference  between  their 
DoBs  exceeds  50%.  A  contradiction  occurs  when: 

facts  cause  rules  to  fire  that  give  two  or  more  possible  scenarios.  Then,  when  the  system  considers 
measured  data,  only  one  or  some  of  these  conclusions  are  confirmed; 

facts  cause  rules  to  fire  which  give  a  single  possibility  that  is  denied  by  measured  information. 

In  the  first  case,  there  is  no  need  to  adapt  the  system,  since  narrowing  existing  possibilities  using  additional 
data  is  an  expected,  natural  outcome  of  the  reasoning  process.  In  the  second  case,  it  is  important  to  manage 
such  a  contradiction,  especially  if  it  involves  key  elements.  This  type  of  contradiction  occurs  when: 

the  information  available  is  inconsistent  for  a  wide  variety  of  reasons; 

the  rules  that  instantiate  the  inferred  variable  are  incorrect  or  incomplete; 

the  specific  ore  is  an  exception  to  the  rules  because  of  incomplete  knowledge  in  the  system. 

Users  can  review  the  measured  data.  If  a  contradiction  exists  after  this  "correction",  the  system  examines 
how  and  why  an  inference  was  made,  providing  another  opportunity  to  adjust  the  model.  If  conflict  still 
persists  then  the  rule  base  is  incomplete  or  the  case  is  an  exception.  The  system  allows  data  weights  to  be 
changed  to  prioritize  more  reliable  data.  Such  adjustments  are  noted  to  allow  a  developer  to  study  the 
likelihood  of  new  knowledge  about  certain  data  combinations.  Contradiction  management  is  done  for  "key" 
process  selection  facts,  such  as  "the  ore  is  refractory"  or  "free  gold  is  available  for  gravity  concentration". 

Justification  for  Process  Choice 

Justification  of  each  process  choice  is  also  important  as  it  pinpoints  the  reasons  and  factors  that  lead  the 
system  to  its  conclusions.  This  was  implemented  by  adding  reasons  why  each  rule  fires  in  the  form  of  text 
variable.  At  the  end  of  a  consultation,  the  main  factors  used  and  concluded  at  each  level  of  inference  are 
presented  to  the  user  in  a  hypertext  report  with  output  and  the  appropriate  rationale  for  each  conclusion. 

System  Validation 

Case  studies  of  existing  plants  and  simulated  cases  analysed  separately  by  experts  show  the  system  can 
identify  different  deposit  types  and  the  effect  of  mineralogy  on  metallurgical  behaviour  (see  Table  1). 
Inferred  behaviour  is  refined  with  added  data,  as  seen  for  Fazenda  Brasileiro  mine  and  the  Carlin-type 
deposit.  The  Witswatersrand  case  shows  two  inferred  options  which  can  be  verified  later  through  testwork. 


CASE  STUDY:  FAZENDA  BRASILEIRO  MINE 

System  verification  was  also  performed  by  simulating  a  complete  feasibility  study  on  an  existing  mine. 
Fazenda  Brasileiro  is  one  of  the  major  gold  mines  operating  in  Brazil,  with  an  annual  production  of  5  tons 
of  gold.  It  started  operation  in  1985.  To  simulate  the  design  process,  only  geological  and  metallurgical 
preliminary  reports  dated  1982  were  used  to  provide  system  input.  The  input  data  is  presented  below: 

With  these  inputs,  the  system  generated  the  flowsheet  depicted  in  Figure  4.  The  suggested  process  route  is 
very  similar  to  the  actual  flowsheet  at  Fazenda  Brasileiro  mine.  In  fact,  the  process  implemented  in  1985 
initially  used  Cyanidation  and  CIP  instead  of  CIL  to  extract  the  gold  but,  in  1995,  the  plant  was  converted 
to  CIL  after  comparative  testwork  showed  CIL  gave  higher  Au  recovery.  The  grinding  circuit  is  also 
different  as  it  uses  finer  crushing  with  more  stages  to  feed  a  ball  mill  directly  which,  at  the  time  of 
implementation,  was  a  more  conservative  and  appropriate  approach.  Today,  with  the  evolution  and  success 
of  SAG  grinding,  this  would  be  the  preferred  choice. 


CONCLUSION 

Development  of  an  expert  system  to  design  processes  for  gold  ores  is  justified  for  the  following  reasons: 

-  process  design  is  an  important  issue  in  gold  projects  as  it  affects  both  technical  and  economic 
feasibility  of  exploiting  an  ore.  Co-ordination  of  data  and  people  needed  to  conduct  the  design  is  often 
difficult  since  information  is  uncertain  and  not  all  "experts"  are  available  at  one  time  and  place; 


315 


knowledge  of  gold  processing  options  can  be  critical  in  providing  input  to  the  early  decision-making  to 
continue  to  explore  and  to  evaluate  the  prospect. 

IntelliGold  was  developed  to  provide  a  consistent  methodology  to  integrate  information  from  different 
areas  of  project  development  and  to  estimates  and  infer  possible  process  options.  The  system  aims  to  give  a 
basis  for  decision-making  during  the  initial  project  stages  even  if  information  is  incomplete  or  unavailable. 

r*»:J 


1^5-rraiy  \ 


Fig.  4.  Flowsheet  generated  for  Fazenda  Brasileiro  by  Intelligold. 


ACKNOWLEDGEMENT 

The  authors  would  like  to  thank  their  supporting  institutions  and  organisations:  The  University  of  British 
Columbia,  Escola  Politecnica  da  Universidade  de  Sao  Paulo  and  Companhia  Vale  do  Rio  Doce. 


REFERENCES 

1.  Nardi,  R.P.,  1996.  Revisao  critica  do  circuito  de  cianeta?ao  de  Fazenda  Brasileiro.  Sao  Paulo,  Escola 
Politecnica  da  Universidade  de  Sao  Paulo,  (Qualifying  examination). 

2.  Torres,  V.  M.,  1996.  Diagnostico  de  Lixivia9ao  para  Minerios  de  Ouro.  Sao  Paulo,  Escola  Politecnica 
da  Universidade  de  Sao  Paulo,  dissertation  (M.Sc.). 

3.  Meech,  J.A.,  1992.  Managing  uncertainty  in  expert  systems  -  a  fuzzy  logic  approach,  in:  31st  CIM 
MetSoc.  Conf.  Proc.,  Edmonton,  77-85. 

4.  Lorenzen,  L.,  1995.  Guidelines  for  designing  a  diagnostic  leach  experiment.  Min.  Eng.,  8(3),  247-256. 

5.  Marsden,  J.,  House,  I.,  1992.  The  chemistry  of  gold  extraction.  London,  Ellis  Horwood  Limited. 

6.  Yannopolous,  J.C.,  1991 .  The  extractive  metallurgy  of  gold.  New  York,  Van  Norstrand  Reinhold. 

7.  Meech  J.A.,  Kumar,  S.,  1996.  A  hyper-manual  on  expert  systems  v.  5.0.  CANMET,  (electronic  book). 

8.  Paterson,  C.J.,  1990.  Ore  deposits  for  gold  and  silver.  Min.  Process.  &  Extract.  Met.  Rev.  6,  43-66. 


316 


Table  L  Input  Data  for  Case  Study  on  Fazenda  Brasileiro. 


Input  information 

Geology 

Probable  type  of  deposit:  archean  greenstone  belt  gold-quartz  veins  (100%) 
Probable  type  of  ore:  Free  milling  (70%);  Fe  and  As  sulphide  bearing  (90%) 

Mineralogy 

Chemical  analysis:  2%  As,  10%  Fe,  2%  S,  8.4  ppm  gold 

Mineralogical  analysis:  2%  Arsenopyrite,  30%  Oxides,  1%  Pyrite,  60%  Quartz 

Au  characterisation:  equipment:  optical  microscopy/electron  microprobe; 
particle  size  (dcjs)  =  37  pm;  particle  location:  border  (70%),  fractures  (100%), 
encapsulated  (20%);  shape:  amorphous;  association:  Fe/As  sulphides/quartz. 

Metallurgical  Testing 

Gravity  concentration  test  with  ground  sample  (d8o  =  0.090  mm): 

55%  recovery  using  a  Knelson-type  Concentrator; 

Cyanidation  testwork  with  ground  sample  (d80  =  0.090  mm): 

95%  recovery  with  800  g/t  NaCN,  6  hr.  pre-aeration,  24  hr.  leaching; 
Flotation  testwork  with  ground  sample  (d80  =  0.090  mm): 

95%  gold  recovery  and  1 0%  weight  recovery 

Throughput  and  Grade 

Estimated  gold  grade  of  5  g/t  and  plant  throughput  of  3200  t/day 

Table  2.  Case  studies  used  to  validate  the  inference  of  process  behavior  from  mineralogy. 


Case  Study 

Input  -  geology 

Inferred  behaviour 

Fazenda  Brasileiro 

(real  case  without 

mineralogical 

information) 

Deposit  type: 

archean  greenstone  belt, 
quartz  veins 

Ore  type: 

Fe-sulphide  bearing  (80%) 
As-sulphide  bearing  (80%) 

None 

gold  is  cyanidable  if  ground  (78%) 
floatable  for  preconcentration  (78%) 
gravity  recoverable  if  ground  (61%) 
high  cyanide  consumption  (75%) 
some  gold  refractoriness  (67%) 
refractory  types:  encapsulation 

As-  and  Fe-sulphides  (100%) 
pyrrhotite/oxides/carbonates  (75%) 

Fazenda  Brasileiro 

(real  case  with 

mineralogical 

information) 

Deposit  type: 

archean  greenstone  belt, 
quartz  veins 

Ore  type: 

Fe-sulphide  bearing  (80%) 
As-sulphide  bearing  (80%) 

Sample  is  representative  (80%) 
Chemical  analysis:  2  %As,  10  %Fe, 

2  %  S,  8.4  g/t  Au  (S.D.  =0.2) 

Gangue  mineralogy: 

1%  pyrite;  2%  arsenopyrite; 

30%  quartz;  60%  oxides 

Equipment  used  for  mineralogy: 
optical  microscopy  +  microprobe 

Gold  Mineralogy: 
particle  size  (dc)5):  37  pm; 
gold  in  fractures,  borders  (70%); 
encapsulated  (20%);  individual  (50%); 
with  sulphides  and  quartz  (70%) 
amorphous  shape; 

gold  cyanidable  if  ground  (true) 

floatable  for  preconcentration  (True) 
gravity  recoverable  if  ground  (57%) 
high  cyanide  consumption  (69%) 

Carlin-type  deposit 

(hypothetical  case  with 
carbonaceous  ore) 

Deposit  type: 

epithermal  gold-silver 
sediment  hosted 

Ore-type:  carbonaceous 

None 

gold  cyanidable  if  ground  (64%) 

high  cyanide  consumption  (75%) 
weak  preg-robbing  effect  (70%) 
floatable  for  preconcentration  (64%) 
some  gold  refractoriness  (70%) 
refractory  types: 

carbonaceous,  pyrite,  arsenopyrite 

Carlin-type  deposit 

with  carbonaceous  ore 
type  and  information 
that  Au  is  encapsulated 

Deposit  type: 
epithermal  gold-silver 
sediment  hosted 
Ore-type:  carbonaceous 

Sample  is  representative  (80%) 

Gold  Mineralogy: 
encapsulated  gold  (100%), 
no  gold  in  borders,  fractures 
or  individual  particles 

gold  cyanidable  if  ground  (59%) 
high  cyanide  consumption  (69%) 
weak  preg-robbing  effect  (65%) 
floatable  for  preconcentration  (59%) 
total  gold  refractoriness  (92%) 
partial  gold  refractoriness  (65%) 
refractory  types 

carbonaceous,  pyrite,  arsenopyrite 

Witswatersrand  type 
deposit 

Deposit  type: 

paleoplacer  quartz-pebble 
conglomerate 

Ore  type:  free  milling 

None 

gold  cyanidable  if  ground  (79%) 
gold  is  gravity-recoverable  (78%) 

(Note:  if  no  measured  behaviour  is  available,  the  combined  degree  of  belief  will  be  the  average  of  the  DoB  in  brackets  and  50%) 


317 


A  Hardware  Design  for  Real-Time 
Multiple  Target  Tracking 

Frederick  Ferguson  and  Chandra  Curtis 

Center  for  Aerospace  Research,  North  Carolina  A&T  State  University 

Greensboro,  NC  27411 


ABSTRACT 

This  paper  describes  the  use  of  a  simulated  real  time  system  of  a  feed-forward  neural  network,  a  recurrent 
neural  network  and  a  set  of  expert  rules  in  solving  the  problem  of  Multiple  Target  Tracking.  It  is  assumed 
that  data  is  provided  in  the  from  of  blips,  taken  off  3  consecutive  focal  plane  arrays,  operating  at  visible  or 
infrared  wavelengths.  In  this  paper,  the  task  of  multiple  target  tracking  is  transformed  from  one  of  blip- 
frame  data  association  to  one  of  target  clustering,  which  in  turn  is  broken  down  and  solved  in  four  stages. 
Each  stage  is  described  and  mapped  with  the  use  of  a  feed-forward,  a  recurrent  neural  network  or  a  set  of 
frizzy  rules.  The  first  and  second  stages  of  the  solution  procedure  involve  the  use  of  two  feed-forward  neural 
network  modules,  while  the  third  and  forth  stages  use  a  recurrent  neural  network  module  and  a  set  of  expert 
rules  module.  The  Multiple  Target  Tracking  solution  procedure  is  simulated  through  use  of  a  FORTRAN 
Code.  In  principle  the  number  of  targets  that  can  be  tracked  with  the  routine  is  unlimited.  However,  in 
reality,  the  number  of  targets  is  dictated  by  the  number  of  neurons,  which  in  turn  is  constrained  by  hardware 
requirements.  Software  simulation  results  shows  that  the  Multiple  Target  Tracking  code  is  capable  of 
tracking  an  arbitrary  number  of  targets  very  efficiently.  The  program  was  tested  and  debugged  for  use  in  the 
tracking  of  sets  of  multiple  targets;  ranging  from  2  to  14.  Results  indicated  that  once  the  average 
acceleration  of  the  targets  is  adequately  evaluated,  track  files  could  be  developed  with  1 00%  accuracy. 


INTRODUCTION 

Air  and  Missile  Defense,  and  Battlefield  Awareness  face  unprecedented  challenges  in  target  detection  and 
tracking.  Traditional  single-sensor  systems  experience  unacceptable  performance  degradation  when  dealing 
with  multiple  targets  in  the  highly  mobile  environment  in  which  today's  forces  must  operate.  New 
developments  in  multiple  sensor  data  fusion  are  addressing  many  of  these  issues  and  advanced  techniques, 
such  as  pre-detection  tracking,  context-sensitive  data  association  and  tracking,  adaptive  data  fiision  and 
tightly-coupled  fusion  and  sensor  management,  are  under  development. 


Fig.  1.  A  Typical  Multiple  Target  Problem  [1-3]. 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


318 


One  of  the  key  functions  performed  by  a  surveillance  system  is  to  keep  track  of  all  targets  of  interest  within 
the  coverage  region  of  its  sensors.  For  military  surveillance  systems,  the  coverage  region  generally  involves 
several  thousand  cubic  miles  containing  several  hundred  targets.  The  sensors  available  for  use  in  these 
surveillance  systems  operate  at  visible  or  infrared  wavelengths,  and  cannot  provide  perfect  information 
about  all  targets.  In  general,  sensor  measurements  of  targets  tend  to  be  ambiguous,  (lack  of  knowledge  as  to 
which  target  was  reported),  incorrect  (reports  of  false  targets),  and  imprecise  (random  errors  in  the 
measurements)  [4],  A  typical  multiple  target-tracking  scenario  of  interest  to  our  defense  forces  is  illustrated 
in  Figure  1 . 


A  BRIEF  REVIEW  OF  MULTIPLE  TARGET  TRACKING  RESEARCH 

Multiple  target  tracking  (MTT)  is  an  essential  requirement  for  surveillance  systems  employing  one  or  more 
sensors  together  with  computer  subsystems  in  order  to  interpret  an  environment  that  includes  both  true 
targets  and  false  alarms.  The  objective  of  MTT  [1-6]  is  to  partition  the  sensor  data  into  sets  of  observations 
or  tracks  produced  by  the  same  source.  The  crux  of  MTT  is  to  carry  out  data  association  process  of 
measurements  whose  origin  is  uncertain  due  to  one  or  all  of  the  following  reasons: 

i.  random  false  alarms  in  the  detection  process; 

ii.  clutter  due  to  spurious  reflectors  or  radiators  near  the  target  of  interest; 

iii.  interfering  targets; 

iv.  decoys  or  other  counter  measures. 

The  three  basic  methods  for  data  association  are  as  follows  [1-8]: 

i.  Nearest  Neighbor  (NN)  method,  which  is  computationally  efficient  but  unreliable  for 
tracking  targets  in  a  highly  cluttered  environment. 

ii.  Joint  Probabilistic  Data  association  (JPDA),  which  has  its  own  shortcomings.  In  tracking 
closely  spaced  targets,  JPDA  delivers  poor  performance  because  of  the  persistent 
interference  from  neighboring  targets.  Also,  when  the  problem  size  increases,  the  required 
computation  increases  exponentially. 

iii.  Multiple  Hypotheses  Tracking  (MHT),  which  is  a  multiple  scan  method  and  both  the 
memory  and  computation  requirement  of  which  increase  exponentially  with  problem  size. 


At  the  Center  of  Aerospace  Research  at  North  Carolina  A&T  State  University  research  in  MTT  has  been 
focused  on  the  NN  method.  Sensor  data  is  provided  in  the  form  of  blips,  taken  off  three  consecutive  focal 
plane  arrays,  operating  at  visible  or  infrared  wavelengths.  A  typical  RF-communication  unit  capable  of 
providing  the  required  data  and  currently  used  to  characterize  sea  and  surface  targets,  is  illustrated  in  Figure 
2.  Our  goal  is  to  develop  an  algorithm  that  simultaneously  traces  the  paths  described  by  multiple  targets  in 


Fig.  2.  A  typical  RF-communication  unit  used  to  characterize  sea  and  surface  targets. 


319 


real  time  as  they  traverse  a  given  domain.  The  algorithm  of  interest  to  this  study  involves  the  use  of  the 
following  paradigms;  a  feed-forward  neural  network,  a  recurrent  neural  network,  and  a  set  of  fuzzy  rules. 
An  interesting  advantage  of  this  algorithm  is  the  fact  that  it  can  be  configured  on  Digital  Signal  Processors 
and  used  in  real  time  applications. 


REAL-TIME  MULTIPLE  TARGET  TRACKING 

In  a  multiple  target  environment  surveillance  systems  are  faced  with  the  prospect  of  identifying  and  tracking 
individual  targets,  based  on  models  developed  from  sensor  reports  and  target  kinematics.  The  best  choice  of 
target  and  sensor  models  will  depend  on  specified  missions  or  applications.  Nevertheless,  regardless  of  the 
choice  of  sensors  and  target  models,  the  bottom  line  capability  of  any  reasonable  multiple  target  tracking 
algorithm  is  its  ability  to  trace  the  appropriate  location  of  all  objects  of  interest  over  specified  time  intervals. 
In  general,  a  simplification  of  the  problem  of  multiple  target  tracking  can  be  described  as  shown  in  Figure  1 . 

Over  time,  target  indications  move  across  the  field  of  view  of  the  sensor,  and  it  is  required  the  targets  be 
clustered  by  their  location.  To  accomplish  this  task,  any  multiple  target-tracking  algorithm  must  identify, 
discriminate  and  locate  all  targets  of  interest  at  specified  time  instances.  However,  sensors  can  only  provide 
focal  plane  array  data  of  ambiguous  targets  or  blips,  i.e.,  the  lack  of  specific  knowledge  as  to  which  target 
was  reported,  at  discreet  time  intervals.  In  this  analysis,  it  is  also  assumed  that  information  on  the  target 
location  (x,  y,  z)  and  target  speed  with  a  time  resolution  of  1/50  second.  Available  systems  [1-4],  as 
illustrated  in  Figures  1  and  2,  capable  of  delivering  these  requirements  are  available  off  the  shelf. 
Moreover,  the  surveillance  system  illustrated  in  Figure  2  can  deliver  information  on  target  length,  shape, 
speed,  internal  motion,  sea  spectrum  and  estimated  weight.  The  most  common  sensors  of  this  type  are  focal 
plane  arrays,  operating  at  visible  or  infrared  wavelengths.  In  this  paper,  it  is  assumed  that  each  target  will  be 
tracked  in  a  256x256  focal  plane  array,  which  can  be  represented  by  the  Cartesian  system  of  coordinates  in 
the  form  of  blips.  A  proposed  field  of  view,  which  includes  hardware  assembly  and  data  frame,  is  illustrated 
in  Figure  3. 


Fig.  3.  Illustration  of  Observation  Domain. 

The  type  of  preprocessing  that  is  necessary  depends  greatly  on  whether  the  input  blip  data  is  already 
ordered  in  some  fashion.  Typical  focal  plane  imagers  come  out  in  row  scan  order.  In  this  paper  it  is  assumed 
that  blip  data  is  extracted  and  labeled  from  each  pixel,  based  on  the  rule  of  fixed  row  with  varying  columns. 
Only  the  pixels  containing  objects  are  labeled.  Results  of  the  blip/frame  data  extraction  using  this  method 
are  illustrated  in  Table  1 . 


Table  1:  Typical  Ambiguous  Data  Clusters 


Blip  Label 

Blip  x  Location 

Blip  y  Location 

1 

X, 

Y, 

2 

x2 

y2 

n 

X„ 

Y„ 

320 


DATA  PROCESSING  DURING  MULTIPLE  TARGET  TRACKING 

The  task  of  real  time  multiple  target  tracking  is  equivalent  to  the  re-organization  of  the  input  blip/frame  data 
into  track  files.  The  mapping  of  input  to  output  is  based  on  the  use  of  three  very  important  mathematical 
paradigms,  namely,  the  feed-forward  neural  network,  the  recurrent  neural  networks,  and  a  set  of  fuzzy  rules. 
The  uses  of  these  paradigms  in  the  multiple  target  tracking  algorithm  presented  herein  is  described  in  the 
following  five  sub-sections.  The  mapping  of  the  input  data  to  track  files,  and  the  details  of  the  mapping 
process  are  illustrated  in  Figure  4. 


i -At  1  x  y! 


Object  #  1,  Frame  #2 


t—ht 

i 


x: 

^1 


i  -  df  Af,  vl. 

! , _ \ 

l-r&t 

4 

yl 

r  1  x;  >f 

'  As  4,  4 

*=> 

MTT 

Mapping 

tt-itt  l  x;  x3, 

Object  Frame  42 

t  -  M  | 

x} 

yj 

t 

t 

•vj 

>i 

t±&t  fV, 

1 

. ft . 

Acceleration 

Probability 

fl!ll 

Pn< 

t,n 

®{jk 

=*  • 

P&t 

=» ' 

- 

Pn\N~. 

t 

rwj 

Acceleration 

Track  Association  | 

Fig.  4.  The  three  stages  of  input  to  output  MTT  data  mapping. 


Elementary  Data  Unit  for  Multiple  Target  Tracking 

The  algorithm  described  herein  is  based  on  3  frames  of  data,  which  is  provided  by  a  sensor  capable  of 
mapping  a  two-dimensional  image  of  the  observation.  No  depth  information  is  needed,  only  speed  and/or 
acceleration.  Figure  5  illustrates  a  single  data  plane,  and  its  use  with  a  previous  and  past  neighbors  forming 
a  basic  unit  of  three  consecutive  data  frames  from  which  appropriate  track  files  are  developed. 


Fig.  5.  Consecutive  arrays  of  ambiguous  targets. 


Track  files  contains  information  about  the  targets  and  their  location  history.  The  successful  tracking  of  a 
given  target  specified  by  the  ith  blip  in  frame2,  involves  the  identification  of  two  corresponding  blips  located 
in  frame1  and  frame3  respectively.  When  i*  blip  in  frame2  combines  with  the  identified  blips  in  frame1  and 
frame3  the  result  is  the  most  likely  path  of  the  target  for  the  duration  of  time,  t-dt,  through  time,  t+dt. 


321 


Notation  Convention  for  the  Multiple  Target  Tracking  Process 

In  figure  5,  and  through  out  this  paper  a  convention  on  the  use  the  indices  i,  j,  and  k  is  adopted.  The  indices 
i,  j,  and  k  are  such,  that,  i,  i  =  1,  N2;  represents  any  blip  in  frame1,  at  time  t-dt;  k,  k  =  1,  N2  represents  any 
blip  in  frame2,  at  time  t;  and  j,  j  =  1,  N3,  represents  the  blips  in  frame3,  at  time  t+dt.  Besides,  the  indices,  i,  j, 
and  k  carry  a  special  implication  when  used  with  the  coordinate  of  a  vector.  Under  the  terms  of  this 
convention  any  parameter  or  variable  labeled  by  the  indices,  i,  j,  and  k,  refers  to  all  of  the  possible 
combination  of  the  three  indices  as  they  vary  independently  from  their  minimum  to  maximum  values.  As  a 
result  any  parameter  labeled  by  the  indices,  i,  j,  and  k,  refers  to  a  vector  of  nin2n3  coordinates.  A  coordinate 
of  the  vector  described  by  the  indices,  i,  j,  and  k,  will  be  described  by  the  expression,  Pm:jb  where  the  index 
m  varies  from  1  through  n2n2n3. 

The  notations  adopted  in  this  analysis  are  outlined  as  follows: 

Convention  #  1:  Pijk  =  vector  of  n2n2n3  coordinates 
Convention  #  2:  Pm;ijk =  mth  coordinate  of  vector  Pyk 

Acceleration  and  Probability  Mapping  with  Feed-Forward  Neural  Networks 

In  this  analysis  it  is  assumed  that  the  time  interval  between  frames  is  uniform,  and  moreover,  that  the  frame 
rate  is  high  enough  so  that  differentiation  can  be  conducted  by  finite  differences.  With  these  assumption,  the 
velocity  vector,  V,  can  be  evaluated  with  any  two  consecutive  frames,  using  the  expression, 

V„=Lt-L,  1. 

Vector,  L,  represents  the  two  dimensional  blip  coordinates,  x  and  y.  The  indices  j,  and  k  represents  the 
appropriate  frames  in  which  the  coordinates  are  taken.  On  the  other  hand,  the  acceleration  vector,  AiJk,  is 
evaluated  with  three  frames,  using  the  expression, 


In  the  expressions  given  in  equations  (1  -  2),  the  notation  convention  adopted  earlier  is  used.  The 
acceleration  vector,  A,y*,  consists  of  njn2n3  coordinates,  each  of  which  is  evaluated  using  one  of  the  possible 
combinations  of  the  indices,  i,  j  and  k.  It  is  of  interest  to  note  that  the  vectors,  L,  V  and  A,  are  described  by 
1,  2,  and  3,  subscripts,  respectively.  This  fact  indicates  that  whereas,  the  velocity  vector  can  be  described  by 
only  2  frames,  the  acceleration  vector  must  be  described  with  the  use  of  3  frames. 

Since  this  analysis  deals  with  the  'angle-only'  or  the  two  dimensional  model  for  target  tracking,  the 
acceleration  vector,  A,  has  two  components,  one  along  each  axis  in  the  Cartesian  system  of  coordinates.  As 
a  result,  the  evaluation  of  the  acceleration  vector,  Ayk,  can  be  conducted  with  the  aid  of  two  feed-forward 
neural  network  modules,  as  follows 


A„  =y/(a'J+(*'J  3. 

Parameters,  axyk  and  ayyk,  represent  the  components  of  the  acceleration  vector.  Equation  (3)  can  be 
evaluated  with  the  use  of  a  single  feed-forward  neural  networks.  However,  for  real  time  on  board  analysis, 
two  neural  networks  module  are  recommended. 

Probability  Mapping  with  Feed-Forward  Neural  Networks 

The  kinetic  model  used  in  this  analysis  is  based  on  the  acceleration  association  probability.  Having  a  priori 
determined  a  mean  acceleration  value,  A ,  for  the  targets  of  interest,  the  normal  probability  density 
function  for  the  acceleration  vector  gives, 


322 


P  =  e[K-^  4. 

The  vector  P  will  be  referred  to  as  the  acceleration  association  probability  vector.  Again,  in  expression  (4), 
the  notation  convention  adopted  earlier  is  used.  The  acceleration  association  probability  vector  P,  consists 
of  nin2n3  coordinates  P,y*,  each  of  which  is  evaluated  using  one  of  the  possible  combinations  of  the  indices, 
i,  j  and  k.  Evaluating  the  acceleration  association  probability  coordinates  can  be  accomplished  real  time 
with  the  aid  of  a  feed-forward  neural  network  module. 


Track  Association  using  Recurrent  Neural  Networks 

The  recurrent  neural  network  is  merely  a  single  layer  feed-forward  network  with  feedback  connections  of 
the  network  output  channels  bridging  its  input  channels.  The  dynamics  of  the  recurrent  network  is 
equivalent  to  the  evaluation  of  an  equation  of  the  type,  described  by  the  following  equation: 


t 


J+l 

at 


5. 


where  the  variables;  w,m  represents  the  network  weights,  Pm:iJk  the  input  acceleration  probabilities,  tijk  the 
track  association  vector  and  the  function,  g,  represents  the  nonlinear  activation  or  threshold  function  used  in 
this  analysis.  Once  again,  the  notation  convention  adopted  earlier  is  used  in  equation  (5).  In  this  report  the 
threshold  function  takes  the  form  of  a  piecewise  linear  sigmoid  function. 

The  very  first  thing  to  be  defined  when  using  the  recurrent  neural  network  is  the  network  weights.  Typically, 
the  network  weights  are  determined  by  differentiating  the  network  energy,  E,  or  cost  function  with  respects 
to  the  acceleration  association  probability.  The  shape  and  state  of  this  energy  function  is  determined  by  the 
network  initial  input  and  the  interconnection  constraints  and  strengths  of  its  neurons,  as  follows: 

E  =  PXX  A')+aZ/V/r*'  6- 

ijk  i'j'k'  ijk 


Here  tijk  represents  the  tracks,  pijk  defines  the  probability  of  the  ijk  track,  8 ^  is  the  Kronecker  Delta  function, 

a  and  P  are  the  optimization  parameters  and  %  is  the  Exclusive  OR  function.  The  Exclusive  OR  function, 
is  defined  as  follows 


x{x,y,z)  = 


l; 

0; 


Uy  =  l,  l 

if\x,z  =  1,  y*  1 

U,z  =  l,  x*\ 

otherwise 


7. 


When  equation  (7)  is  twice  differentiated  with  respect  to  tm:jh  the  appropriate  weights  to  the  recurrent 
neural  network  is  found  in  the  form, 


=  ~P  be  (s ,/  >  8  y  >  S  ur  )]”^* 


8. 


The  range  of  values  for  p  used  in  this  analysis  is:  0  <  p  <  0.  50.  Once  activated  the  network  converges 
toward  a  minimum  of  the  energy  function  in  which  some  neurons  are  on  and  others  are  off.  The  set  of 
neurons  that  are  on  at  convergence  represents  the  group  of  tracks  that  are  part  of  a  final  solution.  Within  the 
context  of  the  multiple  target  tracking  problem,  the  network  weights  also  enforces  the  constraints  of  having 
one  corresponding  blip  per  frame  per  track. 

Association  Probabilities  to  Track  Files  Mapping 

Once  activated  the  recurrent  neural  network  converges  toward  a  minimum  of  the  energy  function,  E,  in 
which  some  neurons  are  on  and  others  are  off.  The  set  of  neurons  that  are  on  at  convergence  represents  the 


323 


group  of  tracks  that  are  part  of  a  final  solution.  At  convergence  the  track  vector,  tjk,  of  n,n2n3  coordinates 
will  have  values  clustered  around  the  values  of  zero  and  one.  Only  the  coordinates  with  values  that  are 
clustered  in  the  upper  range  are  of  interest.  The  next  step  in  the  evaluation  process  is  to  identify  the 
coordinates  with  values  that  are  clustered  in  the  region  of  unity  and  represents  these  tracks  in  terms  of 
targets  located  in  frames  1, 2,  and  3. 

TECHNICAL  RESULTS 

When  properly  initiated  the  simulated  real  time  multiple  target  tracking  accepts  data  in  the  form  of  3 
consecutive  blip-frame  data  sets.  The  data  is  processed  and  track  files  are  developed.  A  FORTRAN  Code 
was  developed  to  simulated  the  functions  of  the  two  feed-forward  neural  networks,  the  recurrent  neural 
network  and  the  set  of  expert  rules  modules.  The  code  for  multiple  target  tracking  is  program  according  to 
the  flow  chart  illustrated  in  Figure  4.  Results  from  observing  the  tracking  algorithm  at  work  with  four 
objects  on  a  256x256  focal  array  plane  over  30  frames  are  illustrated  in  Figures  6.  As  indicated  by  these 
results  the  multiple  target-tracking  algorithm  is  capable  of  tracing  very  complex  movements,  even  when 
there  are  crossover  in  motions. 


Multiple  Target  Tracking,  256x256,  30  Frames  and  4  Targets 


Fig,  6.  Results  from  tracking  four  targets. 

In  principle  the  number  of  targets  that  can  be  tracked  with  this  algorithm  is  unlimited.  However,  in  reality, 
the  number  of  targets  is  dictated  by  the  number  of  neurons,  which  in  turn  is  constrained  by  hardware 
requirements.  Software  simulation  results  showed  that  the  MTT  algorithm  is  capable  of  tracking  an 
arbitrary  number  of  targets  veiy  efficiently.  The  program  was  tested  and  debugged  for  use  in  the  tracking  of 
sets  of  multiple  targets;  ranging  from  3  to  14.  Preliminary  results  indicated  that  once  the  average 
acceleration  of  the  targets  is  adequately  evaluated,  track  files  can  be  developed  with  100%  accuracy. 


REFERENCES 

1 .  R.E.  Bethel,  and  G.J.  Paras,  A  PDF  Multisensor  Multitarget  Tracker.  IEEE  Log  No.  T-AES/34/1/00185 

2.  D.R.  Zahimiak,  D.L.  Sharpin  and  T.W.  Fields,  A  Hardware-Efficient,  Multirate,  Digital  Channelized 
Receiver  Architecture.  IEEE  Log  No.  T  AES/34/1/00184 

3.  P.E.  Pace,  B.H.  Nishimura,  W.M.  Morris  and  R.E.  Surratt,  Effectiveness  Calculations  in  Captive-Carry 
HIL  Missile  Simulator  Experiments.  IEEE  Log  No.  T-AES/34/1/00183 

4.  E.  Mazor,  A.  Averbuch,  Y.  Bar-Shalom  and  J.  Dayan,  Interacting  Multiple  Model  Methods  in  Target 
Tracking:  A  Survey.  IEEE  Log  No.  T-AES/34/1/00182 

5.  B.  Armstrong  and  B.S.  Holeman,  Target  Tracking  with  a  Network  of  Doppler  Radars.  IEEE  Log  No.  T- 
AES/34/ 1/0017 6 

6.  R.  K.  Saha  and  K.  C.  Chang,  An  Efficient  Algorithm  for  Multisensor  Track  Fusion.  IEEE  Log  No.  T- 
AES/34/1/00189 


324 


7. 1  Ishii,  Y.  Nakabo  and  M.  Ishikawa,  Target  Tracking  Algorithm  for  1ms  Visual  Feedback  System  Using 
Massively  Parallel  Processing.  IEEE  Int.  Conf.  Robotics  and  Automation  (Minneapolis,  1 996.4. 25)/Proc. 
IEEE  Int.  Conf.  Robotics  and  Automation,  pp.2309-23 1 4 
8.  J.F.  Pusztaszeri,  P.E.  Rensing  and  T.M.  Liebling,  1 996.  Tracking  elementary  particles  near  their  primary 
vertex:  a  combinatorial  approach.  Journal  of  Global  Optimization,  9:  41-64. 


325 


Low-Cost  Supersonic  Missile  Inlet  Fabrication  Technique 

C.S.  Cornelius*,  D.A.  Gibson** 


*U.S.  Army  Aviation  and  Missile  Command 
Missile  Research,  Development,  and  Engineering  Center 
Redstone  Arsenal,  Alabama  USA 
**Nichols  Research  Corporation,  Huntsville,  Alabama  USA 


ABSTRACT 

This  paper  presents  a  technique  that  allows  for  the  fabrication  of  complex  shapes  to  high  accuracy  without 
the  expense  of  conventional  machining.  The  savings  associated  with  using  this  new  approach  over 
conventional  fabrication  methods  directly  resulted  in  an  85%  reduction  in  fabrication  costs.  This  technique 
was  demonstrated  in  the  fabrication  of  supersonic  inlets  for  a  ramjet  engine  missile.  A  single  computer 
aided  design  (CAD)  model  was  used  for  design  and  hardware  integration,  generation  of  the  rapid 
prototyping  computer  file  for  producing  a  mandrel,  finite  element  modeling  of  the  inlet,  development  of  the 
final  machining  tools  paths  and  inspection  of  the  mandrels  and  finished  inlets.  The  development  of  this 
integrated  single  master  model  and  stereolithography  plating  process  is  a  major  step  forward  in  the  area  of 
missile  component  prototyping  and  frees  the  designer  to  approach  component  design  and  fabrication  with  a 
new,  accurate  and  relatively  inexpensive  tool.  The  savings  realized  from  the  use  of  this  technique  are 
directly  applicable  to  the  manufacturing  of  other  complex-shaped  components  both  for  military  and 
commercial  applications. 

INTRODUCTION 

Design  requirements  and  goals  for  any  future  Army  surface-to-surface  or  surface-to-air  missile  systems 
may  include  extended  range  and  enhanced  payload  capabilities  that  can  be  satisfied  only  by  air-breathing 
missiles.  In  general,  supersonic  air-breathing  missiles  are  much  more  expensive  than  traditional  all-solid 
motor  missiles  due  to  increases  in  integration  complexity  and  high  tolerances  required  for  the  air  and  fuel 
handling  subsystems.  Inlets  for  supersonic  air-breathing  missiles  are  normally  built  to  wind  tunnel 
standards;  with  high  accuracy  and  extremely  fine  surface  finishes.  Highly  accurate  components  are  critical 
during  the  system  development  process  to  ensure  that  the  data  acquired  during  testing  is  relatively 
unaffected  by  normal  fabrication  uncertainties.  However,  if  a  future  supersonic  air-breathing  missile 
system  is  to  be  fielded,  the  costs  associated  with  the  model-shop  precision  employed  during  development 
becomes  unsupportable  even  for  moderate  rate  production.  It  has  become  apparent  that  low  cost  methods 
for  fabrication  of  high  tolerance  components,  such  as  inlets,  are  needed  and  that  low  cost  should  have  as 
much  emphasis  as  high  performance  early  in  the  missile  development  cycle. 


SINGLE  MASTER  MODEL  DEVELOPMENT 

To  incorporate  design  features  that  are  consistent  with  low-cost  manufacturing  techniques  early  in  a 
development,  it  is  crucial  that  the  designer  understand  the  computer  design  and  analysis  tools  available  and 
make  efficient  use  of  a  single  model  for  many  purposes.  In  the  development  of  the  low-cost  inlet,  a  single 
parameter  based,  three-dimensional  computer  aided  design  model  was  used  to  design  the  inlet  interior  and 
exterior  configuration.  For  the  application  studied,  the  inlet  design  was  a  conformal  3-D  geometry  that 
wrapped  around  the  exterior  of  the  missile  fuselage  for  increased  launcher  packaging  density.  The  inlet 
shape  was  therefore  extremely  complex  and  afforded  no  real  opportunities  for  fabrication  by  normal 
machining  methods.  The  inlet  would  have  to  operate  in  Mach  3  sea-level  airflow  conditions  necessitating 
careful  analysis  of  material  and  joining  methods  for  multiple  inlet  pieces.  Since  the  geometry  of  the  inlet 
interior  was  a  non-reentry  shape  and  because  of  the  high  operating  temperatures,  it  was  decided  that  the 
entire  inlet  should  be  built  as  a  single  part. 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


326 


THE  TECHNIQUE 

A  novel  fabrication  technique  was  developed  to  utilize  in-house  rapid  prototyping  capabilities.  A  summary 
of  the  technique  is  shown  in  Figure  2.  A  computer-generated  model  of  the  inlet  was  created  and  used  to 
construct  a  mandrel  based  on  the  inside  surfaces  of  the  inlet.  Due  to  the  dimensional  limitations  of  the 
stereolithography  equipment  and  length  of  the  mandrel  required  for  the  inlet,  the  mandrel  model  was 
divided  into  three  interconnecting  pieces  and  a  .STL  file  of  each  of  section  was  produced.  The  .STL  file  is 
the  format  needed  to  provide  the  stereolithography  equipment  with  the  data  necessary  to  create  the  model. 
The  three-mandrel  pieces  were  then  aligned,  joined,  faired  and  smoothed  to  the  required  interior  surface 
finish.  The  completed  mandrels  were  then  checked  by  automated  inspection  equipment  against  the  mandrel 
computer  file  for  acceptance. 


Fig.  1.  Single  master  computer  aided  design  model  was  used  for  many  purposes. 

The  master  inlet  computer  model  was  also  used  to  define  the  inlet  structure  and  interfaces  with  the  missile 
and  test  equipment.  A  file  of  the  inlet  model  was  exported  to  a  finite  element  program  for  structural 
analysis  of  the  inlet.  The  structural  analysis  was  used  to  determine  the  minimum  inlet  thickness  required  to 
withstand  the  test  conditions  with  an  appropriate  margin  of  safety. 

The  plastic  mandrels  were  surface-coated  with  a  thin  silver  paint  and  allowed  to  cure.  Electrical 
connections  were  integrated  onto  the  mandrel  base  and  pure  nickel  was  electrically  deposited  on  the 
mandrel  surface.  The  individual  inlet  platings  were  monitored  periodically  to  determine  shell  thickness. 
The  plating  process  was  continued  until  the  minimum  required  thickness  to  withstand  the  flight  conditions 
was  achieved. 

Based  on  the  information  from  the  master  inlet  model,  the  rough  inlets  were  then  machined  on  one  end  to 
establish  the  missile  attachment  flange  interface.  An  attachment  flange  was  welded  to  the  inlet  using 
specially  designed  welding  fixtures  to  ensure  proper  flange  and  inlet  alignment.  An  IGES  file  generated 
from  the  master  inlet  computer  model  was  used  to  provide  the  tool  paths  for  a  5-axis  CNC  milling  machine 
to  machine  the  flange,  exterior  inlet  cowl  shape  and  leading  edges.  Minimal  hand  fairing  of  the  cowl  and 


327 


lip  was  required.  The  IGES  file  was  then  used  by  automated  inspection  equipment  to  check  the  completed 
cowl  and  leading  edge  location,  shape  and  dimensions.  The  surface  finish  of  the  interior  surfaces  ranged 
between  16  and  23  micro-inches. 

Views  of  the  assembled  three-piece  stereolithography  inlet  mandrel,  the  mandrel  prepared  for  plating  and 
the  completed  supersonic  inlet  are  shown  in  Figures  3  through  5. 


Fig.  3.  Assembled  mandrel  with  integrated  pressure  instrumentation  features. 


Fig.  4.  Mandrel  preparation  prior  to  plating. 


Fig.  5.  Completed  supersonic  conformal  inlet. 


329 


TESTING  OF  COMPLETED  HARDWARE 

The  completed  inlets  were  installed  on  a  full-scale  ramjet  missile  and  subjected  to  flight  conditions.  The 
inlets  operated  as  designed  under  extremely  harsh  temperature  and  loading  and  met  all  design  and 
performance  requirements.  Performance  of  the  inlets  was  compared  to  wind  tunnel  inlets  built  in  a  model- 
shop.  The  data  indicated  the  performance  of  the  "low  cost"  inlets  and  the  expensive  wind  tunnel  model 
were  essentially  the  same. 


PROGRAM  COST  SAVINGS 

The  inlets  fabricated  using  the  low-cost  technique  performed  essentially  as  well  as  the  very  expensive  wind 
tunnel  model  inlets  at  a  much  lower  cost.  The  cost  of  an  instrumented  wind  tunnel  inlet  was  approximately 
S100K  when  built  using  conventional  high  precision  fabrication  techniques.  The  total  cost,  including 
mandrel  preparation,  plating  and  final  machining  for  the  low-cost  inlet  was  $15K  each  -  a  savings  of  85% 
without  a  significant  reduction  in  overall  performance. 


CONCLUSION 

This  was  the  first  time  that  this  or  a  similar  rapid  prototyping  technique  was  used  to  fabricate  a  supersonic 
inlet  or  structural  element  of  a  missile  and  subjected  to  supersonic  flight  loads.  This  technique  employed  a 
single  master  inlet  computer  aided  design  (CAD)  model  for  design  and  test  hardware  integration  purposes, 
generation  of  the  rapid  prototyping  computer  file  for  producing  a  mandrel,  finite  element  modeling, 
determining  the  tools  paths  for  final  machining  and  for  inspection  of  the  mandrels  and  finished  inlets.  The 
development  of  this  process  is  a  major  step  forward  in  the  area  of  missile  component  prototyping  and  frees 
the  designer  to  approach  mechanical  and  electrical  component  design  and  fabrication  with  a  new,  accurate 
and  relatively  inexpensive  tool.  The  savings  realized  from  the  use  of  this  technique  are  directly  applicable 
to  the  manufacturing  of  other  complex-shaped  components  both  for  military  and  commercial  applications. 


330 


331 


Design  of  High  Performance  Missile  Structures  Utilizing 
Advanced  Composite  Material  Technologies 

J.  R.  Esslinger,  R.  N.  Evans,  and  G.  W.  Snyder 

Propulsion  and  Structures  Directorate, 

Missile  Research,  Development  and  Engineering  Center 
U.S.  Army  Aviation  &  Missile  Command,  Redstone  Arsenal,  Alabama 

ABSTRACT 

The  U.S.  Army  Aviation  and  Missile  Command  (AMCOM)  has  demonstrated  the  ability  to  develop  and 
utilize  advanced  composite  material  technologies  for  the  design  and  fabrication  of  hypervelocity  kinetic 
energy  missiles  for  the  next  generation  of  Army  air  defense  and  anti-tank  applications.  Future  kinetic 
energy  missiles  must  be  small,  fast,  lethal,  and  maneuverable,  which  requires  the  delivery  vehicles  to 
operate  in  a  severe  loading  environment.  Innovative  designs  and  manufacturing  techniques  have  been 
developed  to  provide  an  avenue  for  enhancing  propulsion  system  performance  while  significantly  reducing 
the  missile  size  and  mass  requirements.  Propulsion  units  with  high  strength-to-density  ratio  filament 
wound  composite  motorcases  are  stronger,  stiffer,  and  more  readily  producible  than  their  metallic 
counterparts;  however,  these  structures  are  susceptible  to  manufacturing  variability  and  are  more  easily 
damaged  during  handling  and  storage.  This  paper  will  discuss  the  AMCOM  motorcase  fabrication 
approach  and  its  applications  as  well  as  development  efforts  in  the  area  of  embedded  sensor  technology  for 
in-process  monitoring,  structural  characterization,  damage  detection,  and  service  life  monitoring  of 
filament  wound  composite  motorcases.  The  advanced  composite  material  applications  have  enabled  major 
improvements  in  System  Applications  for  Hypervelocity  Missile  concepts  and  integration  to  multiple 
lightweight  launch  platforms. 


SYSTEM  DESIGN  AND  DEVELOPMENT  CONSIDERATIONS 

The  goals  and  factors  influencing  future  U.S.  Armed  Forces  stress  the  need  for  rapidly  deployable 
continental  United  States  (CONUS)  based  forces  to  engage  regional  threats  promptly  in  decisive  combat  on 
a  global  basis.  Size  and  weight  are  paramount  factors  for  weapon  systems  supporting  this  future  force 
structure.  Hypervelocity  kinetic  energy  (KE)  missiles  offer  a  highly  viable  means  of  maintaining  weapon 
effectiveness  at  substantially  lower  weight  and  reduced  length  while  achieving  essentially  double  the  range 
of  KE  tank  gun  projectiles.  These  characteristics  are  particularly  important  for  missiles  fired  from  both 
armored  vehicles  and  air  vehicles  (helicopters).  Some  of  the  systems  technology  requirements  and 
developments  to  achieve  the  required  reductions  in  weight  and  size  are  briefly  addressed.  The  weapon 
system  development  objective  is  directly  linked  to  a  potential  future  lightweight  armored  vehicle  weapon 
system  that  utilizes  a  hypervelocity  kinetic  energy  missile  system  as  opposed  to  a  gun  launched  KE 
projectile  for  its  primary  kill  mechanism.  It  is  important  to  emphasize  the  need  for  well  planned  and 
executed  system  development  efforts  which  start  with  coordinated  technology  advancements  in  all 
supporting  disciplines;  however,  the  major  thrust  of  this  paper  is  the  utilization  of  advanced  lightweight 
composite  materials  technology  for  the  development  of  next  generation  hypervelocity  missile  concepts. 

Lethality 

The  primary  lethality  challenge  is  to  demonstrate  the  perforation  of  advanced  tank  armor,  such  as 
composite  steel/ceramic  armor  covered  with  increasingly  sophisticated  explosive  and  nonexplosive  reactive 
armor.  An  additional  challenge  is  to  establish  or  prove  the  lethality  of  a  range  of  reasonably  sized  Kinetic 
Energy  Penetrators  impacting  targets  with  more  than  25  MJ  of  kinetic  energy.  The  tradeoff  between 
penetrator  weight  and  impact  velocity  has  a  profound  effect  on  the  total  missile  weight  as  these  are  the 
primary  variables  influencing  the  design  of  the  propulsion  sub-system. 

Propulsion 

Solution  of  several  propulsion  technical  challenges  are  crucial  to  developing  future  operational 
hypervelocity  missiles  which  would  represent  significant  advances  over  current  state-of-the-art.  The  desire 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


332 


for  small  takeoff  weights  and  volume,  drive  the  need  for  high  specific  impulse  and  energy  density,  while 
maximizing  the  propellant  weight  fraction  (PWF).  Very  short  bum  times  and  high  mass  flow  rates  are 
required  to  compete  with  the  minimum  range  of  tank-launched  kinetic  energy  penetrators  and  to  decrease 
velocity  loss  due  to  drag.  Hypervelocity  missile  solid  rocket  propulsion  research  and  development  has 
focused  on  simultaneously  achieving  four  goals  which  are  widely  perceived  to  be  mutually  conflicting. 
The  four  goals  are: 

1)  high  specific  impulse  (>250  lbf-s/lbm) 

2)  very  high  bum  rate  propellants  (~3  inches/second) 

3)  non-detonable  propellants 

4)  minimum  smoke  propellants 


Structures 

The  challenge  to  provide  increased  kinetic  energy  lethality  against  advanced  tank  armors,  at  minimum  and 
extended  ranges,  requires  a  high  performance  propulsion  unit  and  a  control  system  able  to  survive  the 
hypervelocity  boost  and  coast  phases.  Filament- wound  carbon-reinforced  composite  motor  cases  using  high 
strength  to  density  ratio  fibers  and  high  performance  propellant  technology  provide  a  baseline  for  achieving 
the  desired  MACH  6.5  velocity  at  a  range  below  500  m.  in  0.6  seconds.  Remarkable  advances  have  been 
made  with  carbon  reinforced  materials  in  filament  wound  or  braided  motor  cases.  Propellant  Weight 
Fractions  (PWF)  in  excess  of  80  %  have  been  demonstrated  in  a  motor  which  provides  30,000  pounds  of 
thrust  for  0.6  seconds.  Even  greater  PWF  designs  appear  to  be  feasible  for  these  high  pressure  rocket 
motors.  A  major  challenge  is  to  increase  the  maximum  service  temperature  of  processable  resin  materials 
from  the  neighborhood  of  190  °C  up  to  the  neighborhood  of  steel  (-400  °C).  Success  would  significantly 
reduce  the  amount  of  thermal  protection  material  in  the  centerbody  case,  thus  reducing  total  missile  mass. 
The  composite  motor  PWF  is  also  enhanced  by  using  lightweight  composite  nozzles  and/or  nozzle  inserts. 


PROPULSION  SYSTEM  OBJECTIVES 

The  U.S.  Army  Aviation  and  Missile  Command  has  developed  multi-mission  kinetic  energy  missile 
concepts  which  impact  their  targets  with  4-6  times  the  energy  of  conventional  tank  fired  projectiles.  These 
kinetic  energy  missiles  fulfill  a  close  combat  mission  role,  which  generally  includes  line  of  sight  targets  at 
ranges  from  200  meters  to  5000  meters.  AMCOMfe  kinetic  energy  missiles  reach  a  peak  velocity  of  over  2 
km/s  at  a  range  less  than  500  meters,  and  can  be  launched  from  multiple  light  platforms.  Peak  velocity 
goals  are  based  on  maximizing  the  penetration  performance  of  continuous  long  rod  kinetic  energy 
penetrators.  The  additional  energy  and  improved  penetration  performance  provides  increased  hole  volume 
in  armored  targets,  which  relates  to  higher  system  lethality  from  the  behind  armor  debris  spallation  and 
collateral  damage  to  the  vehicle. 

The  kinetic  energy  mission  pushes  propulsion  system  design  to  the  limit.  High  chamber  pressures  (>3000 
psi)  combined  with  short  bum  times  (<  lsec)  improve  performance,  but  place  significant  stresses  on  the 
structure.  High  strength  composite  materials  offer  the  potential  for  surviving  the  severe  operational  loading 
environment  while  reducing  the  inert  mass.  The  minimization  of  inert  mass  is  critical  for  kinetic  energy 
missiles  so  that  propulsive  energy  can  be  placed  into  lethal  mechanism  impact  energy  rather  than  into 
carrying  parasitic  weight  that  adds  little  or  no  effect  to  lethality.  In  addition  to  being  high  performance,  line 
of  sight  kinetic  energy  propulsion  systems  must  have  minimum  signature  and  must  be  non  detonable  to 
meet  insensitive  munitions  requirements. 

The  effectiveness  and  feasibility  of  the  hypervelocity  kinetic  energy  concept  was  conducted  as  part  of  an 
AMCOM  Missile  Research,  Development  and  Engineering  Center  technology  demonstration  program  for 
the  Advanced  Kinetic  Energy  Missile  (ADKEM).  ADKEM  utilized  a  four-clustered  booster  concept  in 
which  boost  motors  were  discarded  upon  burnout,  and  a  low-drag  penetrator  delivery  vehicle  was  guided  to 
the  target  under  coast.  A  flight  test  of  the  ADKEM  concept  is  shown  in  Fig.  1 . 


333 


Fig.  1.  Advanced  Kinetic  Energy  Missile  (ADKEM)  Concept 


MOTORCASE  OPTIMIZATION 

In  developing  the  propulsion  units  for  the  ADKEM,  AMCOM  selected  the  ambitious  design  goal  of  a 
reduced  smoke  solid  ftieled  motor  with  a  propellant  weight  fraction  (PWF)  of  0.85.  PWF  is  the  ratio  of 
propellant  weight  to  total  propulsion  system  weight  and  is  an  important  measure  of  motorcase  design 
efficiency.  For  meeting  PWF  goals  and  operating  at  a  nominal  chamber  pressure  of  3400  psi,  a  filament 
wound  composite  case  utilizing  carbon  fiber  and  an  epoxy  matrix  offered  the  best  material  solution. 

Attachment  and  interface  structures  present  a  challenge  to  achieving  an  efficient  motorcase  design. 
AMCOM  investigated  methods  to  eliminate  or  reduce  the  weight  of  nozzle  and  motor  closure  attachment 
mechanisms.  AMCOM  has  adopted  the  practice  of  integrally  winding  a  composite  nozzle  insulator  to 
eliminate  an  aft  interface  structure  and  reduce  nozzle  weight.  Head  closures  for  tactical  composite 
motorcases  are  traditionally  mechanically  fastened  inside  a  full  diameter  opening  at  the  forward  end  of  the 
case.  In  previous  AMCOM  filament  wound  motorcases  with  full  diameter  forward  openings  and  integrally 
wound  nozzles,  the  highest  PWF  obtianed  was  0.80.  The  ADKEM  design  replaced  the  traditional  forward 
joint  with  an  integrally  wound  polar  boss.  The  resulting  3.75  inch  diameter  unitary  motorcase  design, 
shown  in  Fig.  2,  achieved  a  PWF  of  0.84. 


POLAR  ADAPrcp- 


RUBSER  INSULATION 
KEVLAR  FILLET.  £PDu 
0.C30  IN  THICK 


INTEGRALLY  WOUND 
/-SlUCA  PHENOLIC 
'  NOZZLE  INSERT 


»  1.55 


Fig.  2.  ADKEM  Motor  Design  Layout 

The  fabrication  techniques  developed  for  ADKEM  are  unique  in  that  they  provide  precision  alignments 
about  the  centerline  of  the  motor  and  eliminate  post  processes  such  as  application  of  internal  insulation  and 
joining.  The  issues  of  tooling  for  the  closed  geometry  of  ADKEM  was  addressed  through  an  expendable 
filament  winding  tool  and  a  collapsible  tool  for  propellant  casting.  Kevlar  filled  polyisoprene  was  selected 
for  use  as  internal  insulation  between  the  propellant  grain  and  carbon/epoxy  case.  The  insulation  was 
applied  to  the  mandrel  before  winding  and  was  co-cured  with  the  motorcase.  The  case  was  fabricated  using 
90  degree  hoop  layers  to  carry  radial  pressure  and  40  degree  helical  layers  to  carry  axial  loads  and  to  retain 
the  forward  polar  boss  and  the  nozzle  insulator.  Fig.  3  shows  insulation  application,  the  winding  of  a  helical 
layer  over  the  polar  boss,  a  compression  molded  silica  phenolic  nozzle  with  non-eroding  insert,  and  the 
integration  of  the  nozzle  with  the  case  structure. 


Fig.  3.  ADKEM  Motorcase  Fabrication 


INNOVATIVE  MATERIAL  SOLUTIONS 

Filament  wound  composites  offer  design  flexibility  unattainable  with  conventional  materials.  AMCOM  has 
used  this  flexibility  to  implement  innovative  approaches  that  address  critical  issues  related  to  tactical 
missiles.  One  of  these  critical  areas  is  the  improvement  of  missile  and  launch  platform  integration  by 
reducing  the  missile  envelope  to  increase  the  number  of  stowed  kills,  the  number  of  missiles  that  can  be 
stowed  on  a  given  launch  platform. 

Since  tactical  missiles  are  likely  to  be  subjected  to  various  environmental  and  handling  extremes  and 
composite  structures  face  the  effects  of  manufacturing  variability,  another  critical  area  is  service  life 
monitoring.  The  Army  is  currently  researching  the  use  of  embedded  fiber  optic  sensor  arrays  for 
monitoring  filament  wound  structures  from  manufacturing  to  final  use. 

Annular  Motorcase  Development 

In  an  effort  to  mitigate  certain  risk  areas  and  more  efficiently  package  the  ADKEM,  the  UNICORN 
alternate  propulsion  unit  was  envisioned.  The  4  ADKEM  boost  motors  were  replaced  with  a  single  motor 
with  a  rod  and  tube  propellant  grain  configuration.  Fig.  4  shows  a  sketch  of  the  UNICORN  missile  concept. 

UNICORN  offers  several  advantages  over  the  clustered  ADKEM  configuration.  The  missile  base  diameter 
was  reduced  from  10.6  inched  to  8.6  inches,  representing  a  33  percent  reduction  in  missile  frontal  cross 
sectional  area  and  an  increased  number  of  stowed  kills.  The  unitary  booster  concept  also  reduces  the  effects 
of  thrust  misalignment  and  aerodynamic  instabilities  associated  with  the  original  four  cluster  booster 
design.  In  ADKEM  there  were  also  concerns  with  simultaneously  igniting  four  motors;  however,  the 
UNICORN  single  booster  concept  eliminates  this  issue. 


Fig.  4.  UNICORN  Alternate  Propulsion  Concept  for  ADKEM 


The  UNICORN  rod  and  tube  motor  configuration,  shown  in  Fig.  5,  consists  of  outer  and  inner  structural 
shells  with  an  annular  nozzle  throat.  The  ADKEM  lethal  centerbody  is  submerged  within  the  inner  shell 
and  egresses  from  the  motor  after  boost.  For  solid-fueled  motors,  a  rod  and  tube  grain  offers  performance 
characteristics  that  are  advantageous  for  systems  that  require  high  bum  rates.  In  the  past,  these  advantages 
have  been  offset  by  the  lack  of  an  efficient  means  to  support  the  inner  grain  inside  the  motor  chamber. 
AMCOM  addressed  this  problem  by  developing  an  inner  shell  support  structure  at  the  nozzle  throat. 

The  UNICORN  motorcase  incorporates  many  of  the  fabrication  techniques  used  in  ADKEM.  Both  outer 
and  inner  shells  are  filament  wound  using  carbon  fiber  and  an  epoxy  matrix.  The  outer  shell  utilizes  a  silica 
phenolic  nozzle  and  an  aluminum  forward  polar  boss  that  are  integrally  wound  with  the  motorcase.  The 
inner  shell  presented  significant  design  challenges  because  it  is  subjected  to  external  pressure  combined 


335 


with  the  large  axial  force  resulting  from  pressure  acting  on  the  inner  portion  of  the  annular  nozzle.  The 
primary  risk  areas  for  the  inner  shell  were  in  designing  a  lightweght  composite  structure  to  withstand  the 
external  buckling  loads  and  in  developing  effective  forward  and  aft  joints.  The  forward  joint  is  an  integrally 
wound  stainless  steel  cylindrical  adapter  with  locking  grooves.  The  aft  joint  is  accomplished  via  an  external 
version  of  the  integrally  wound  nozzle  concept.  The  silica  phenolic  nozzle  components,  shown  in  Fig.  6, 
have  integral  supports  that  maintain  alignment  between  the  outer  and  inner  shells. 


Fig.  5.  UNICORN  Motor.  Fig.  6.  UNICORN  Flightweight  Hardware. 


The  UNICORN  motor  was  designed  to  a  4700  psi  maximum  expected  operating  pressure,  a  7050  psi  design 
burst  pressure,  and  a  350  ms  bum  time.  A  successful  static  firing  of  the  UNICORN  motor  demonstrated 
that  the  concept  could  meet  the  ADKEM  propulsion  requirements. 

Structural  Health  Monitoring 

The  small  size  of  optical  fibers  and  fiber  optic  based  sensors  make  them  ideal  candidates  for  building 
embedded  sensor  networks  within  a  filament  wound  structure.  The  Army  is  currently  sponsoring  research 
through  a  Small  Business  Innovative  Research  (SBIR)  agreement  with  Technology  Development 
Associates,  Inc.  The  research  focuses  on  several  critical  areas  related  to  embedded  fiber  optics  in  filament 
wound  structures.  These  areas  include  automated  embedding  of  sensor  arrays,  development  of 
ingress/egress  techniques,  and  assessment  of  the  effects  of  embedded  sensors  on  structural  integrity. 

The  issue  of  automated  embedding  is  being  addressed  by  utilizing  a  modified  carbon  fiber  delivery  eye  that 
locates  the  optical  fiber  directly  beneath  the  carbon  fiber  tow  as  it  contacts  the  part.  The  modified  fiber 
delivery  eye  is  shown  in  Fig.  7.  During  the  early  phases  of  the  program  several  fiber  optic  based  sensors 
were  investigated;  however,  Bragg  Grating  sensors  offered  the  most  promising  results.  Multiple  Bragg 
Gratings  can  be  applied  to  a  single  fiber.  Multiplexing  techniques  can  then  be  used  to  query  the  various 
gratings  along  the  fibers  to  obtain  strain  and  temperature  measurements.  Sensors  may  be  automatically 
embedded  along  both  helical  and  hoop  directions.  Fig.  8  shows  a  5.75  inch  diameter  pressure  vessel  with 
embedded  fiber  optic  sensors  and  surface  mounted  strain  gages  for  correlation  of  strain  measurement  data. 


,  .  iTCUicuuinibu  i  ivn 

PreprctfOptical  Fibo-  Eye  prepretfOptica)  Fiber  Eye  Shown  with  both  Optical  Fiber 

Shown  by  Itself  Shown  with  Optical  Fiber  Path  Path  Gabon  Fiber  Path 


Fig.  7.  Automated  Embedding  of  Fiber  Optics.  Fig.  8.  Instrumented  5.75  inch  Bottle. 

Hydrostatic  testing  of  pressure  vessels  with  embedded  optical  fibers  has  shown  that  the  embedded  fibers 
have  little  effect  on  the  structural  integrity  of  the  pressure  vessel.  In  order  to  demonstrate  the  capabilities  of 
the  fiber  optic  sensor  array,  the  sensors  were  used  to  monitor  the  vessel  during  cure  and  hydrostatic  testing. 
Fig.  9  shows  strain  measurements  during  a  typical  150  °C  cure  cycle.  Strain  peaks  can  be  seen  during 
matrix  polymerization  and  mechanical  cross-linking.  The  residual  strain  present  in  the  structure  after  cure  is 


336 


readily  evident.  Fig.  10  shows  a  high  correlation  between  Bragg  Grating  and  surface  mounted  strain  gage 
data  taken  during  a  hydrostatic  pressurization  experiment. 


Fig.  9.  Cure  Monitoring  Experiment  Fig.  10.  Pressurization  Experiment 

Results  to  date  indicate  fiber  optic  sensors  offer  a  useful  and  practical  tool  for  structural  health  monitoring 
of  filament  wound  composite  missile  structures.  The  Army’s  research  in  this  area  is  an  ongoing  effort. 


MATERIALS  TECHNOLOGY  FOR  FUTURE  MISSILE  SYSTEMS 

The  AMCOM  Missile  Research,  Development  and  Engineering  Center  is  actively  pursuing  technology  for 
the  Army  After  Next  (AAN)  initiative,  which  focuses  on  developing  warfighting  capabilities  for  the  2025 
timeframe.  One  of  the  primary  AAN  goals  is  to  create  a  highly  mobile  land  force  that  can  be  rapidly 
deployed  across  the  globe.  The  Future  Combat  System  (FCS)  or  Multi-Mission  Combat  System  (MMCS)  is 
the  yet-to-be  defined  compliment  to  the  Legacy  force  for  the  Army  After  Next.  The  FCS/MMCS  will  be 
fast,  lightweight  and  capable  of  engaging  heavily  armored  ground  vehicles.  One  possible  scenario  is  to 
reduce  the  vehicled  profile  and  weight  by  replacing  the  traditional  cannon  with  both  line  of  sight  kinetic 
energy  missiles  and  beyond  line  of  sight  missiles  that  can  defeat  the  next  generation  of  armored  targets. 

The  AMCOM  envisions  advancements  of  technology  for  the  FCS  requirements  to  include  a  missile  concept 
that  represents  the  evolution  of  kinetic  energy  missile  technology  developed  for  ADKEM  and  UNICORN. 
The  Compact  Kinetic  Energy  Missile  technology  program  offers  lethality  against  hardened  and  reactive 
armored  targets  at  a  minimum  range  of  400  meters  and  a  maximum  range  of  4  kilometers.  The  propulsion 
unit  and  airframe  act  as  the  delivery  vehicle  for  a  long  rod  penetrator  that  is  embedded  in  the  motor 
chamber.  A  line  sketch  of  the  CKEM  Technology  Testbed  configuration  is  shown  in  Fig.  11. 


The  system  goals  call  for  a  robust  propulsion  system  that  is  high  performance,  minimum  signature,  and 
non-detonable.  To  best  meet  these  goals  a  modification  of  the  ADKEM/UNICORN  high  rate  reduced 
smoke  propellant  has  been  formulated.  To  enable  a  large  number  of  missiles  to  be  stored  inside  a  low 
profile  ground  vehicle,  substantial  restrictions  on  missile  envelope  and  weight  are  necessary.  Parametric 
trade  studies  were  initiated  with  the  assumption  that  the  overall  missile  length  could  not  exceed  6  feet, 
launch  weight  would  be  no  more  than  1 10  pounds,  and  missile  diameter  would  be  kept  to  a  minimum. 

Trade  study  results  showed  that  an  increase  in  the  propellant  bum  time  along  with  a  reduction  in  the 
payload  weight  results  in  a  reduction  in  the  motor  length  and  weight.  An  increase  in  the  motor  diameter 
yields  a  reduction  in  the  motor  length  but  results  in  an  increase  in  the  motor  weight.  Inert  weight  was 
reduced  by  utilizing  lightweight  composite  materials  and  by  eliminating  the  weight  of  an  inner 
shell/centerbody  by  embedding  the  penetrator  inside  the  motor  chamber.  Bum  time  was  lengthened  from 


337 


350  ms  for  ADKEM/UNICORN  to  600  ms  for  CKEM.  Longer  bum  times  resulted  in  an  inability  to  meet 
minimum  range  requirements.  As  with  ADKEM  and  UNICORN,  the  CKEM  round  is  6  feet  long;  however, 
the  missile  base  diameter  has  been  reduced  to  6.5  inches.  CKEM  offers  a  62  percent  reduction  and  a  43 
percent  reduction  in  missile  frontal  area  as  compared  to  ADKEM  and  UNICORN,  respectively. 

The  CKEM  motorcase  consists  of  a  filament  wound  T-1000  carbon/epoxy  composite  motorcase  with  an 
integrally-wound  carbon  phenolic  nozzle  insulator  and  a  pinned  forward  closure.  The  practice  of  integrally 
winding  the  carbon  phenolic  nozzle  reduces  thrust  misalignment  and  parasitic  weight.  The  embedded 
penetrator  is  supported  at  its  forward  end  by  the  motor  closure  and  at  its  aft  end  by  a  four-spoked  silica 
phenolic  support  structure.  The  motorcase  forward  opening  is  a  near  open  end  design  to  allow  for 
penetrator  assembly.  The  composite  lay-up  in  the  forward  joint  region  consists  of  a  hybrid  laminate 
comprised  of  T-1000  carbon/epoxy  and  fiberglass/epoxy  for  bearing  strength.  The  forward  joint  has  been 
tested  to  10,500  psi,  (72.4  MPa)  which  is  121%  of  the  design  burst  pressure.  The  final  flightweight  design 
resulted  in  a  propellant  weight  fraction  of  0.82.  The  configuration  is  shown  in  Fig.  12. 


Fig.  12.  CKEM  Motor  Layout.  Fig.  13.  Penetrator  Support  Integration. 

The  CKEM  geometry  requires  a  collapsible  filament  winding  tool.  MARCORE  polyisocyanurate  foam 
tooling  material  developed  by  Lockheed  Martin  at  NASA  Marshall  Space  Flight  Center  was  used  as  an 
expendable  winding  tool.  MARCORE  foam  is  compatible  with  epoxy  resin  systems,  is  easily  machined, 
and  is  unaffected  by  typical  oven  and  autoclave  cure  cycles.  Fig.  13  shows  integration  of  the  penetrator 
support  structure  with  the  winding  tool.  After  cure,  the  foam  is  removed  with  a  high  pressure  water  jet. 


The  design  point  selected  for  the  CKEM  has  given  rise  to  critical  design  issues  that  are  currently  being 
addressed.  Concern  has  been  raised  regarding  the  effects  of  the  high  velocity  flow  region  on  the  penetrator 
support  structure,  penetrator  support  structure  survivability,  and  nozzle  performance.  An  extensive 
experimental  test  program  using  full-scale  flightweight  hardware,  currently  in  fabrication,  is  planned  for  the 
evaluation  of  the  nozzle  components,  the  penetrator  support  structure,  and  the  motor  performance. 


CONCLUSION 

The  technical  issues  and  challenges  associated  with  demonstration  and  development  of  the  next  generation 
hypervelocity  KE  missile  at  half  the  current  mass  and  size,  but  with  increased  lethality  characteristics  are 
difficult  to  solve  and  require  cutting  edge  missile  technology  advancements.  The  solution  of  these  technical 
challenges  should  be  accomplished  in  a  missile  system  context.  This  should  be  done  even  if  the  entire 
missile  system  will  never  be  flown  and  will  only  be  used  to  investigate  the  interaction  of  the  components  in 
a  virtual  prototype.  The  tradeoff  between  hypervelocity  missile  component  performance  and  missile  system 
performance  and  cost  requirements  is  perhaps  the  greatest  technical  challenge  of  all.  The  pursuit  of 
component  performance,  independent  of  system  constraints,  is  simply  inappropriate.  The  U.S.  Army 
Aviation  and  Missile  Command  has  utilized  advanced  composite  materials,  unique  fabrication  techniques, 
and  innovative  attachments  and  interfaces  to  provide  light  weight,  high  performance  structures  to  meet  the 
requirements  of  current  and  future  missile  systems.  Innovative  concepts  developed  by  AMCOM  provide 
enabling  technologies  for  significantly  enhancing  missile  operational  and  life  cycle  performance. 
Demonstrations  of  the  propulsion  unit  relative  to  system  constraints  are  the  backbone  for  realizing  the 
potential  for  achieving  the  desired  requirements  of  the  next  generation  hypervelocity  KE  weapon  system. 


338 


339 


Modelling  the  Mechanical  Stability  of  Metal  Catalyst  Carriers 

C.  Guist,  H.  Bode 

Bergische  Universitat-Gesamthochschule  Wuppertal,  Germany 


ABSTRACT 

To  be  sure  about  an  efficient  and  flexible  design  for  technical  systems,  instrumentation  is  required  which 
will  assist  in  assessing  and  predicting  reliability.  Simulation  is  of  increasing  value  in  this  respect.  With 
simulations,  it  is  the  translation  of  a  technical  system  to  a  virtual  plane,  the  modelling  process,  that  is  the 
factor  that  determines  success.  This  factor  can  be  seen  from  the  example  that  simulates  mechanical  stability 
of  metal  catalyst  carriers. 


INTRODUCTION 

Function  of  a  catalytic  converter  and  generel  description  of  the  product 

Catalytic  converters  are  installed  in  exhaust  systems  of  internal  combusion  engines  so  that  harmful  gases 
like  CO,  HC  and  NOx  are  transformed  into  less  harmful  gases  such  as  CO2,  H20  and  N2.  Platinum,  rhodium 
and  palladium  are  examples  of  elements  that  act  as  catalysts.  They  are  placed  inside  a  porous,  ceramic  layer 
known  as  the  wash-out.  The  wash-out  in  turn  lies  on  a  substrate  either  made  of  ceramic  (cordierite)  or  a 
high  alloy  steel  containing  Cr  (-20%),  A1  (5-6%),  and  reactive  elements  (~  0.03-0. 1  %)  [1], 

Figure  1  shows  a  converter  with  ceramic  carrier  and  a  metal  carrier  respectively.  This  report  concentrates 
on  metallic  substrates  and  the  products  derived  from  them. 


vaiiiaaiiiiiiiiin 

■I ■■■■■> ■■■■«■■■■! 

■■■■■viiaiiia 1 ■  1 
V  I  ■■■  1 

’  ■  1 1  a  •  •  ■  ■■  ■  a  ain  1 11 
••liiaiiiiiiiKim 
’■■a aiaiaai iiiiaai  1 


Fig.  1.  Catalyst  support  systems  (left)  metal  based  and  (right)  ceramic  based  [Courtesy  of  EMITEC  GmbH] 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


340 


The  product  generally  used  with  a  metal  carrier,  as  shown  on  the  left-hand  side  of  Figure  1 ,  consists  of 
smooth  and  corrugated  pieces  of  foil  moslty  50|im  thick  and  a  so-called  jacket.  The  foil  is  usually  soldered 
to  the  points  of  the  contract  in  the  frontal  area  and  is  also  jointed  to  the  jacket  at  previously  designated 
places  by  one  more  soldered  joints.  Partial  soldering  of  foil  at  defined  points  in  the  axial  depth  is  also 
possible  and  may  be  specified  depending  on  application. 

The  number  of  cells  made  in  this  way  depends  on  the  type  of  vehicle,  i.e.,  its  application,  and  there  may  be 
100  to  600  cells  per  inch2.  When  the  metal  carrier  has  been  coated,  it  is  installed  in  the  exhaust  system  by 
welding. 

Former  and  future  product  developments 

The  transformation  of  harmful  gases  is  almost  complete  with  the  use  of  optimised  products,  i.e.,  catalysts, 
in  an  equally,  optimised  exhaust  system.  Until  now,  the  requirement  was  for  metal  carriers  to  have  high 
oxidation  and  form  stability,  i.e.,  a  sufficiently  high  resistance  to  high-temperature  corrosion  during  the 
entire  lifetime  of  the  product.  This  requirement  is  necessary  because  of  the  prevailing  temperature  load  of 
~900  °C  during  continous  operation.  In  the  event  of  malfunction,  the  temperature  may  even  exceed 
1100°C.  These  temperatures  can  be  tolerated  by  steel  foil  but  not  by  the  porous  wash-out  layer  which 
becomes  a  dense  layer  at  such  high  temperatures  -  hindering  access  of  exhaust  gas  to  the  catalyst  elements. 

Temperature  distribution  is  generally  uneven.  Results  of  development  work  in  the  field  of  high-temperature 
corrosion  resistance  were  compiled  in  1997  [2]. 

Due  to  specific  advantages  of  metal  carriers  over  ceramic  ones  [4],  further  new  development  of  modem 
production  technology  in  producing  catalyst  carriers,  as  shown  for  example  in  [4]  and  foil  [5,6],  and  also 
stricter  emission  tolerance  limits  [7],  the  application  potential  of  metal  carriers  have  greatly  increased. 
Their  production  today  is  about  10%  (in  comparison  with  90%  ceramic  carriers)  and  is  growing  annually  at 
two-figure  rates. 


Catalytic 

Converter 


cata§gt 


Slactricaliy 
Heated  Cat 


Conical 

Catalyst 


A  Vehicle  Emissions 


1990 


1995 


1998 


Fig.  2.  Technical  Progress  in  Exhaust  Gas  After-treatment  for  Low-Emission  Vehicles 

Figure  2,  updated  from  [8],  shows  some  stages  in  product  development.  Although  lower  emission  tolerance 
values  have  been  achieved,  this  has  been  accompanied  by  greater  requirements  of  mechanical  stability  and 
resistance  to  oxidation  which  have  been  achieved  by: 


341 


placing  the  catalyst  near  the  motors,  and/or 
using  a  catalyst  that  can  be  heated  by  electricity;  or 
combining  a  electrically-heated  catalyst  with  an  HC-trap;  or 
using  a  cone  catalyst,  etc. 

The  production  stages  require  application  of  computer- supported  construction  tools  and  implementation  of 
results  from  further  development  in  processing  technology.  It  must  be  stressed  that: 
foil  thickness  of  30pm  instead  of  50pm  is  possible  [9]. 

A1  content  of  8%  (and  perhaps  higher)  can  be  used  [5,6],  to  provide  improved  oxidation  resistance. 

the  use  of  microwaved  foil  instead  of  a  smooth  foil  is  possible  [10] 

structured  foil  can  be  made  to  produce  local  turbulence  in  the  exhaust  stream  [11] 

cell  density  of  a  maximum  of  600  cells/inch2  can  be  increased  to  1200  cells/inch2  [12]. 

Former  and  future  requirements  of  the  product  life  span 

Earlier  demands  of  the  product  life  span  were  100,000  km.  This  minimum  life  expectancy  will  be  increased 
to  180,000  km  in  the  future.  The  specified  reduced  emission  limits  mentioned  above  have  triggered  product 
development  which  along  with  demand  for  extended  product  life  span,  have  meant  very  exacting 
development  tasks  for  catalyst  and  vehicle  manufacturers  and  for  their  product  suppliers.  Among  other 
things,  there  is  a  need  to  test  further  processing  developments,  e.g.,  by  making  30pm  thick  foil  instead  of 
50pm  foil  without  conflicting  with  the  need  for  a  simultaneous  increase  in  the  life  span  of  the  product. 

Former  procedure  in  product  design 

Oxidative  and  mechanical  requirements  of  catalysts  are  specific  to  vehicles.  Securing  the  highest  product 
protection  against  malfunction  means  that  with  respect  to  product  design,  apart  from  applying  generally 
valid  procedures  in  construction,  e.g.,  applying  finite  element  methods  and  know-how,  vehicle-specific 
tests  are  also  necessary.  These  include  large-scale,  expensive  experiments  with  engine  test  stands,  exhaust 
gas  simulation  tests  and  vehicle  endurance  runs.  Details  about  such  tests  may  be  found  in  the  literature  [13], 

Proposal  for  application  of  modelling  for  future  procedure  in  product  design 

Subject  to 

-  a  sharp  increase  in  new  applications  and 

-  the  required  protection  against  malfunction,  in  particular  in  new  applications  and 

-  the  above  mentioned  future  development  of  products 

simulation  of  vehicle-dependent  loads,  with  load  variants  or  with  different  engines  within  one  type  of 
vehicle,  and  also  the  comparison  of  materials  and  product  features  can  greatly  reduce  development  time. 
They  may  even  be  more  cost  effective  and  display  the  required  protection  against  product  malfunction. 
Preliminary  research  in  this  area  has  been  carried  out  at  the  Bergisch  University,  GH  Wuppertal  in  the 
Department  of  Materials. 

The  following  exposition  focuses  on  investigating  mechanical  strength,  since  modelling  is  only  possible 
when  material  and  component  characteristics  are  known  for  the  conditions  present  in  a  vehicle.  Essential 
observations  concerning  load  are  in  the  forefront  of  our  investigation.  Afterwards  there  will  be  a  short 
explanation  as  to  the  procedure  for  assessing  mechanical  stability  where  a  reference  system  is  used  with  a 
researched  load  profile.  Further  information  on  the  modelling  method  appears  in  [14]. 


ANALYSIS  OF  INFLUENCING  PARAMETERS  AND  MODELLING 

During  the  driving  operation  the  catalyst  is  subjected  to  an  extremely  high  load.  This  load  is  made  up  of  a 
variety  of  types  of  influence.  These  include  the  built-in  circumstances  (vehicle,  engine,  exhaust  system, 
etc.)  and  the  driving  operation  (distance  covered,  way  of  driving,  etc.). 

At  the  same  time  the  catalyst  itself  affects  these  types  of  influence.  These  and  the  catalyst  together 
determine  the  boundary  conditions  under  which  the  catalyst  must  carry  out  its  function. 

This  mechanism  can  be  described  as  follows  (see  Figure3): 


342 


A 


B  C  D  E 


Vehicle  boundary  l 
conditions 

Exhaust  gas  emissions 

Engine  vibrations 

Vehicle  vibrations 

Environment 

Layout 

Model 

e.g. 


Catalyst 


I.  Catalyst  constraints  I  Load  characteristics  i  %  Reaction 


Requirements 


1.  Flow  distribution 

2.  Temperature  gradient 

3.  Pressure  distribution 

4.  Mechanical  load 

5.  Chemical  constraints 


1.  Mechanical  load 

2.  Influence  on 
material  prperties 
and  design 


1.  Loss  of 
mechanical 
stability 

2.  Corrosion 

3.  Thermal  altering 

4.  Poisening 

5.  Erosion 


1.  Mechanical  stability 
of  carrier 

2.  Converting 
behavior 


Fig.  3.  Process  chart  of  exhaust  system 


The  boundary  conditions  produce  a  load  profile  that  affects  the  catalyst,  e.g.,  the  exhaust  system  produces  a 
specific  emission  which  has  an  effect  upon  the  catalyst.  This  triggers  certain  reactions  in  the  catalyst.  The 
type  and  extent  of  these  reactions  establish  whether  catalyst  demands  are  being  met. 


If  a  balance  shell  is  laid  around  the  system  boundaries  of  the  catalyst  (simplified  in  Figure  4  as  represented 
on  an  engine-related,  dual-flow,  catalytic  system  of  a  6-cylinder  engine),  then  the  catalyst’s  boundary 
conditions  are  found  as  described  in  row  B  with  respect  to:  distribution  of  flow,  temperature  gradients, 
pressure  distribution,  mechanical  load,  chemically-active  environment. 


The  load  profile  emerges  from  interactions  with  the  described  parameters  (areas)  of  the  catalyst.  This 
consists  of  mechanical  load,  influence  of  material  characteristics  and  product  construction.  The  catalysts 
reaction  results  from  the  relationship  between  load  and  strength.  One  reaction  of  the  catalyst  is  mechanical 
stability  response,  e.g.,  in  the  sense  of  fatigue  durability.  The  connections  shown  in  Figure  3  are  to  be 
interpreted  in  such  a  way  that  each  subsequent  link  is  a  function  of  all  preceding  links. 


QUALITATIVE  RESULTS  OF  AN  INVESTIGATION  INTO  STRESS 


Reference  system  and  indication  of  construction  variable 

The  reference  system  consisting  of  a  first  and  a  second  catalyst  can  be  seen  in  Figure  4. 


Fig.  4:  Schematic  graph  of  reference  exhaust  system. 


Table  1  provides  data  about  the  reference  system.  Differences  between  the  components  used  should  be 
noted,  such  as  diameter,  length,  length  of  soldering,  cell  density,  foil  thickness  and  design. 


Table  1:  Specifications  of  Reference-Catalysts 


Diameter 

|mm] 

Length 

[mm] 

Brazed  Length 

[mm| 

Cell  density 
[cpsil 

Design 

80 

50.8 

2-15 

300 

0.065 

SM,  W3 

98.4 

90 

2-25 

400 

0.05 

S,  W5 

343 


Load  components  with  influence  on  mechanical  stability  (fatigue  durability) 

The  procedure  used  will  be  described  in  [14]  for  investigating  load  components  that  have  an  influence  on 
mechanical  stability  (fatigue  durability).  The  qualitative  results  are  shown  in  Table  2.  Accordingly  the 
following  must  be  observed  more  closely: 

vibration  loads, 
thermal  loads  and 
flow  loads. 

This  entails  indicating  their  extent  in  tension  and  expansion  values  by  experimental  methods  or  through 
simulation  of  loads  and  comparing  them  with  material  and  component  characteristics  that  are  also  related  to 
tension  and  expansion  values. 


Table  2:  Fatigue  load  components 


Vibration  Load 

Thermal  Load 

Gas  Flow  Load 

axial  / 

high  frequency 

Inertia  forces 

Flow  forces 

axial  / 

high  frequency 

Natural  frequency 

radial  / 
low  frequency 

Deformation  due 
to  thermal  load 

RESULTS  OF  INVESTIGATIONS  INTO  STRENGTH 

Material  and  component  strength 

On  the  basis  of  applied  production  technology  which  includes  for  example  the  identified  high-temperature 
soldering  process  and  local  soldering,  it  must  be  noted  that  reference  to  foil  characteristics  is  not  sufficient 
when  studying  mechanical  stability;  it  is  necessary  rather  to  refer  to  component  characteristics  as  well  (see 
also  the  details  about  the  reference  system). 

Stress  on  the  catalyst  is  caused  by  processes  occurring  at  high  frequency  in  the  engine  (forces  of  mass, 
change  in  load)  and  by  low-frequency  processes  from  the  transient  driving  operation  (thermally-related 
tensions).  This  mainly  dynamic  load  influences  the  catalyst’s  behaviour  as  regards  strength. 

Dynamic  strength  characteristics  were  established  by  carrying  out  fatigue  tests.  High-frequency  load  occurs 
as  a  result  of  inertial  and  flow  forces  which  act  from  outside.  Fatigue  tests  were  carried  out  analogous  to 
this  load  while  controlling  the  force. 

Static  strength  characteristics  are  needed  to  be  able  to  assess  load  from  transient  driving  operation,  i.e. 
thermally-related  tensile  strength  and  pressure  tension  regarding  component  stability.  The  following 
limitation  should  be  noted  in  that  the  influence  of  time  with  respect  to  load  is  not  taken  into  account  in  this 
report,  i.e.  creep  and  creep  fatigue  are  not  considered  for  assessment. 

Tension  tests 

The  limit  of  elasticity  decreases  as  a  function  of  temperature  in  the  presence  of  static  load.  This  means  the 
transition  from  elastic  cell  to  plastic  cell  deformation  greatly  depends  on  temperature.  Quantitative  values 
may  be  found  in  the  chart  included  in  the  section  "Procedure  in  defining  mechanical  stability". 

Pressure  tests 

Figure  5  demonstrates  the  development  of  pressure  as  a  function  of  course  covered.  In  this  case  cube 
samples  were  used  made  from  smooth  and  corrugated  foil  pieces  soldered  together  at  contact  points.  The 
number  of  layers,  i.e.,  the  number  of  corrugated  foil  layers  between  smooth  layers  varied  up  to  22.  An 
elasticity  module  defined  for  this  application  may  be  derived  from  the  almost  linear  rise  in  force  in  the  area 
of  elasticity  (the  area  of  proportionality).  This  depends  on  the  number  of  layers.  As  cell  density  increases  so 
does  resistance  to  elastic  deformation.  This  relationship  must  also  be  taken  into  account  in  the  respective 
model  that  should  have  an  overall  valid  character. 


344 


Contact  of  first  layer, 


Fig.  5:  Graph  of  Pressure  versus  Compression  (Specimen:  2  Layers.  400  cpsi,  0,05  mm) 

Dynamic  tests 

Figure  6  shows  a  result  from  dynamic  tests  done  on  soldered  samples  in  the  tension-pressure  alternating 
field  at  room  temperature  and  600  °C.  It  can  be  seen  here  that  only  very  small  tension  deflections  in  the 
elastic  area  can  be  tolerated  at  900  °C. 

Investigation  into  allowable  expansion 

A  mathematical  model  is  described  in  [14]  that  provides  evidence  that  the  occurrence  of  expansion  depends 
on  radial  catalyst  dimensions.  Accordingly  near  the  jacket,  the  maximum  allowed  expansion  of  0.06 
mm/mm  at  room  temperature  and  0.048  mm/mm  at  the  tested  800  °C  takes  place  as  shown  in  Figure  7. 

Comparison  Between  Temperature-Related  Mechanical  Load  and  Limit  of  Elasticity  and  Expansion 

In  Figure  7  shown  below  load  and  strength  are  compared  in  the  upper  right-hand  part.  It  may  be  seen  from 
here  that  mechanical  load  contributed  by  the  temperature  profile  in  the  outer  layers  (comparative  tension)  is 
greater  than  the  temperature-related  limit  of  elasticity  that  is  also  evident  at  this  point.  In  these  areas  plastic 
deformation  can  be  expected.  In  the  same  diagram  evidence  as  to  the  extent  of  expansion  occurring  in  the 
reference  system  may  be  found  in  relation  to  the  maximum  allowed  expansion.  The  procedure  for 
investigating  comparative  tension  and  maximum  allowed  expansion  is  described  in  [14], 


PROCEDURE  FOR  DEFINING  MECHANICAL  STABILITY 

The  main  procedure  for  defining  mechanical  stability  can  also  be  seen  in  Figure  7.  This  diagram  shows  not 
only  characteristic  material  and  component  features,  but  also  quantitative  data  that  have  been  taken  from 
the  reference  system  by  appropriate  analysis  [14].  According  to  this  analysis,  temperature-related  tension  is 
of  the  greatest  importance. 


346 


According  to  the  present  state  of  modelling  an  individual  assessment  concerning  mechanical  stability  must 
be  made.  The  aim  of  an  overall  valid  model  will  be  to  compare  one  or  two  stress  values,  e.g.,  tension  and 
expansion  with  the  actual  values  of  the  material  and  component  characteristics.  This  comparison  must  also 
take  into  account  the  time  of  the  stress:  This  indeed  illustrates  the  complexity  of  the  modelling  process. 


REFERENCES 

1 .  H.  Bode,  1997.  Developmental  status  of  materials  for  metal  supported  automotive  catalysts,  in  [3],  17- 
31. 

2.  H.  Bode,  ed.,  1997.  Metal-Supported  Automotive  Catalytic  Converters.  Wiley-VCH,  ISBN  3-883555- 
254-2. 

3.  G.  Faltermeier,  B.  Pfalzgraf,  R.  Briick,  C.  Kruse,  W.  Maus,  A.  Donnerstag,  1996.  Katalysatorkonzepte 
fur  zukunftige  Abgasgesetzgebungen  am  Beispiel  eines  1,8  1  5V-Motors.  17,  Inter.  Motor.  Symp.,  25- 
26,  Wien,  Vienna. 

4.  W.  Maus,  R.  Brack,  1998.  The  Conical  Catalytic  Converter-Potential  for  Improvement  of  Catalytic 
Effectiveness.  SAE  Paper  98  2633 

5.  I.M.  Subkonnik,  S.  Chang,  B.  Jha,  1997.  DuraFoil  ICR-a  New  Material  for  Catalytic  Converter 
substrates,  published  in  [3],  93-97. 

6.  A.  Kolb-Telieps,  J.  Klower,  R.  Hojda,  U.Heubner,  1997.  A  New  Production  Technique  for  Fe-Cr-Al. 
published  in  [3],  99-104. 

7.  H.  Oetting,  editor,  1996  and  1998.  Future  Emission  Legislation  in  Europe  and  USA:  Technical 
Solutions,  Petrol  Engines.  Haus  der  Technik  e.V.,  Essen,  Germany,  1996  :No.  30-918-056-6. 

8.  W.  Maus,  1997.  Mobility,  Prosperity  and  Environmental  Protection  -  the  catalytic  converter  is 
indispensable,  in  [3],  3-13. 

9.  J.  Klower,  H.  Bode,  M.  Brede,  R.  Brack,  A.  Kolb-Telieps,  L.  Wieres,  1998.  Development  of  High- 
Temperature  Resistant  Fe-Cr-Al- Alloys  for  Metal  Exhaust  Gas  Catalysts.  Materials  Week  1998, 
Munich,  Section  2,  ISBN  3-527-29955-6. 

10.  U.  Martin,  1994.  Research  into  Flow  Relationships  in  the  Cell  Channels  of  an  Exhaust  Catalyst  with 
Metal  Substrate  by  Simulation  on  a  Model.  M.Sc.  Thesis,  Bergische  University-GH  Wuppertal,  Dept. 
Mech.  Eng. 

11.  R.  Brack,  J.  Diringer,  U.  Martin,  W.  Maus,  1995.  Flow  improved  efficiency  by  new  cell  structures  in 
metallic  substrates.  SAE  Paper  95  0788 

12.  W.  Maus,  R.  Briick,  P.  Hirth,  1998.  Theoretische  Auslegung  eines  Abgasnachbehandlungssystems  zur 
Einhaltung  der  kalifomischen  SULEV-Grenzwerte.  2nd  Paper,  given  at  4th  Symposium 
“Entwicklungstendenzen  bei  Ottomotoren”,  TAE-Symposium  23956/64.153,  Esslingen,  Germany. 

13.  T.  Nagel,  W.  Maus,  J.  Breuer,  1997.  Development  of  more  Exacting  Test  Conditions  for  Close- 
Coupled  Converter  Applications,  in  [3],  107-126. 

14.  C.  Guist,  1998.  Dauerhaltbarkeit  von  bestehenden  Abgassystemen  mit  metallischen  Katalysatortragem 
und  Moglichkeiten  der  Zuverlassigkeitserhohung.  to  be  published,  Bergische  Univ.  Wuppertal,  FB  14, 
D  468 


347 


Integration  of  Newly  Developed  Al  Assembly, 
Production,  and  Material  Flow  Virtual  Tools 

Daniel  A.  Holder*,  Raymond  D.  Harrell*,  Terri  L.  Calton**,  John  F.  Atkinson*, 

Brandy  M.  Brasfield* 

*US  Army  AMCOM,  Redstone  Arsenal,  Alabama,  USA 
**Intelligent  Systems  and  Robotics  Center,  Sandia  National  Laboratories***, 
Albuquerque,  NM  87185-1008,  USA 

***Sandia  is  a  multiprogram  laboratory  operated  by  Sandia  Corporation,  a  Lockheed 
Martin  company,  for  the  United  States  Department  of  Energy  under  contract  DE- 

AC04-94-AL85000.) 

ABSTRACT 

In  this  paper  we  discuss  the  applicability  of  artificial  intelligence  virtual  tools  in  addressing  real  world 
design  for  manufacturing  issues  and  the  lessons  learned  from  experimenting  with  this  approach.  A  current 
project  that  has  predicted  a  70%  reduction  in  scheduling  as  achievable  is  addressed.  Operation  of  the  real 
production  line  of  the  project  will  be  discussed  with  a  comparison  of  predicted  values  versus  actual. 


INTRODUCTION 

The  integration  of  production  process  modeling  and  material  flow  software  with  newly  developed  Artificial 
Intelligence  (AI)  assembly  software  provides  a  critical  link  in  creating  a  seamless  virtual  manufacturing 
environment  from  solid  model  designs  through  manufacturing  issues  to  facility  planning  and  layout. 

Production  simulation  modeling  based  on  AI  has  been  proven  in  the  United  States  Army  over  the  last 
several  years  as  providing  greater  insight  into  the  cost,  schedule  and  efficiency  of  building  and  maintaining 
weapon  systems.  In  every  case  when  applied  properly  has  provided  additional  unexpected  benefits  from 
healing  contractual  disputes  to  aiding  in  reengineering  business  processes.  Adding  material  flow  software 
to  this  capability  allows  the  optimization  of  facility  planning. 

The  capabilities  above  are  currently  commercial  off  the  shelf  software  tools.  Historically,  providing 
production  simulation  assumed  that  the  assembly  order  was  known.  No  consideration  had  been  given  as  to 
whether  the  assembly  was  optimal  or  even  possible.  A  method  of  performing  constraint  based  “What-if’ 
analysis  on  assembly  planning  in  conjunction  with  production  and  material  flow  planning  was  needed. 
Recently,  Archimedes,  under  continued  development  at  Sandia  National  Laboratories  has  demonstrated  that 
constraint-based  interactive  assembly  planning  software  has  come  of  age.  Archimedes  has  demonstrated 
success  in  planning,  optimizing,  simulating,  visualizing,  and  documenting  sequences  of  assembly.  Given  a 
CAD  model  of  the  product,  Archimedes  automatically  finds  part  to  part  contacts,  generates  collision  free 
insertion  motions  and  chooses  an  assembly  order.  Combined  with  an  Engineer’s  knowledge  of  application 
specific  assembly  process  requirements  allows  systematic  exploration  of  the  alternative  assembly 
sequences.  Archimedes  implements  AI  in  the  planning,  geometric  reasoning,  and  search  algorithms,  the 
constraint-based  implementation  heuristics,  and  the  graphical-user-interface. 


AI  ASSEMBLY  SOFTWARE 

Manufacturing  companies  throughout  the  world  are  rapidly  changing  in  order  to  survive  in  todayk  highly 
competitive  market  environments.  Some  examples  of  coping  with  changing  environments  are 
manufacturing  globalization,  automated  and  intelligent  manufacturing,  virtual  manufacturing,  and  agile 


0-7803-5489-3/99/$  1 0.00  ©1999  IEEE. 


348 


manufacturing.  The  objective  of  this  movement  in  manufacturing  is  to  improve  flexibility,  reliability  and 
productivity,  and  to  achieve  competition-based  technology  development. 

Accordingly,  the  main  focus  of  Sandiab  automated  assembly  analysis  and  planning  research  and 
development  program  is  to  provide  intelligent  software  tools  which  automate  many  of  the  manufacturing 
processes  that  have  traditionally  been  known  to  be  the  most  costly,  the  most  time-consuming,  and  the  most 
error-prone.  Some  of  these  include  part-level  assembly  planning,  fixture  planning,  grasp  planning,  motion 
planning,  tools  planning,  and  cost  analysis.  Sandiak  overall  strategy  to  reduce  these  costs  is  to  push  the 
breadth  of  application  and  depth  of  analysis  and  to  find  an  appropriate  balance  between  human  and 
machine  planning.  Figure  1  helps  illustrate  this  concept.  The  ultimate  goal  is  to  improve  profitability  of 
operations  by  developing  smart  software.  The  developers  of  the  Archimedes  software  focus  on  the 
limitations  of  the  commercially  available  software  packages  and  the  needs  of  the  manufacturing  community 
to  provide  better  solutions  quicker. 


Assembly  Shop  Floor 


Design 


Part-level  Assembly 
Planning 


Manufacturing 

Constraints 


Fixture  planning 
Grasp  planning 
Motion  planning 
Tools  planning 
Cost  estimates 


Fig.  1.  Geometric-reasoning  for  manufacturing  processes. 

The  Archimedes  4.0  system  is  a  constraint-based  interactive  assembly  planning  software  tool  used  to  plan, 
optimize,  simulate,  visualize,  and  document  sequences  of  assembly.  Given  a  CAD  model  (ACIS® 
representation)  of  the  product,  the  program  automatically  finds  part-to-part  contacts,  generates  collision- 
free  insertion  motions,  and  chooses  assembly  order.  The  engineer  specifies  a  quality  metric  in  terms  of 
application-specific  costs  for  standard  assembly  process  steps,  such  as  part  insertion,  fastening,  and 
subassembly  inversion.  Combined  with  an  engineer^  knowledge  of  application-specific  assembly  process 
requirements,  Archimedes  allows  systematic  exploration  of  the  space  of  possible  assembly  sequences.  The 
engineer  uses  a  simple  graphical  interface  to  place  constraints  on  the  valid  assembly  sequences,  such  as 
defining  subassemblies,  requiring  that  certain  parts  be  placed  consecutively  with  or  before  other  parts, 
declaring  preferred  directions,  etc. 

The  system  considers  thousands  of  combinations  of  ordering  and  operation  choices  in  its  search  for  the  best 
assembly  sequences  and  ranks  the  valid  sequences  by  the  quality  metric.  Graphical  visualization  enables 
the  engineer  to  easily  identify  process  requirements  to  add  as  sequence  constraints.  Planning  is  fast, 
enabling  an  iterative  constrain-plan-view-constrain  cycle.  For  some  restricted  classes  of  products,  it 
determines  plans  that  optimize  a  given  cost  function,  graphically  illustrates  those  plans  with  simulated 
robots,  and  facilitates  the  generation  of  robotic  programs  to  carry  out  those  plans  in  a  robotic  workcell. 
Figure  2  represents  the  overall  structure  of  the  system.  At  the  top-middle  and  on  the  left-hand  side  are  the 
design  and  constraint  modules,  which  capture  and  represent  the  geometric,  mechanical,  and  other 
information  about  the  product  required  for  analysis.  These  constraints  come  from  a  wide  variety  of  sources: 
design  requirements,  part  and  tool  accessibility,  assembly  line  and  workcell  layout,  requirements  of  special 
operations,  and  even  supplier  relationships;  they  can  drive  the  choice  of  a  feasible  or  preferred  assembly 
sequence. 


349 


Assembly  Design  (CAD) 


Other  Constraints  Robot  Instructions 


Fig.  2.  The  Archimedes  4.0  Assembly  Analysis  and  Planning  Software  System. 

The  modules  listed  on  the  right-hand  side  are  the  output  modules.  They  include  options  to  capture  the 
sequences  in  the  form  of  3D-animations  and  videos,  textual  scripts  and  snap-shots  that  can  be  used  for 
maintenance  instructions  and  technical  publications.  The  system  also  generates  skeleton  scripts  to  run 
robots,  cost  analysis  information,  and  ergonomic  analysis  information. 

Throughout  the  development  the  system  has  been  applied  to  a  wide  variety  of  products  from  industry  and 
government  and  has  been  tested  on  over  100  assemblies.  Assembly  part-count  ranges  from  5  to  1500. 
ACIS®  data  sizes  range  from  0.2  MB  to  212  MB  where  the  data  for  each  distinct  part  is  counted  only  once, 
regardless  of  the  number  of  times  that  part  appears  in  the  assembly.  Planning  times  vary  from  4  seconds  up 
to  approximately  6  hours.  Planning  times  given  are  those  required  to  load  in  the  pre-facetted  data,  identify 
all  contacts  in  the  assembly,  and  find  a  single  geometrically  valid  part-level  assembly  sequence.  Times 
were  reported  using  an  SGI  Indigo  Extreme  workstation. 

Statistical  results  indicate  savings  in  both  time  and  money.  Early  reports  by  some  users  show  more  than  a 
75%  reduction  in  time  schedules,  and  a  25%  reduction  in  prototype-fabrications  cost. 


PRODUCTION  PROCESS  MODELING 

Once  a  assembly  process  has  been  defined  and  optimized  using  Archimedes  the  next  step  is  to  introduce 
and  test  the  process  inside  a  manufacturing  facility.  This  step  introduces  several  more  constraints  outside 
the  scope  of  Archimedes  that  can  have  a  dramatic  impact  on  cost  and  schedule.  Some  of  these  constraints 
can  include  throughput  requirements,  multi-manufacturing  lines  sharing  common  resources,  availability  and 
shifts  of  labor,  pre-  and  post-processing  of  assembly  parts  such  as  environmental  testing  and  oven  curing, 
sub-assembly  part  availability,  power  outages,  machine  breakdown  and  repair,  yields,  bottlenecks,  rework, 
lot  sizes,  part  starts/day,  and  quantity  of  tooling  and  fixture  availability.  Additional  constraints  to  include 
contractual  agreements,  business  process  and  safety  and  environmental  regulations  can  have  dramatic  and 
pivotal  impacts  upon  the  assembly  times.  The  most  effective  means  for  accounting  for  these  constraints  is 
to  use  production  process  modeling. 

Production  modeling  has  been  available  commercially  within  the  last  two  decades.  However,  only  in  the 
last  couple  of  years  has  production  process  modeling  become  the  versatile  tool  it  is  today.  Object  Linking 
and  Embedding  (OLE),  Open  Database  Connectivity  (ODBC),  Multimedia,  and  Component  Object  Model 
(COM)  standards  and  others  have  been  adopted  enabling  easier  integration  of  other  software  tools. 


350 


The  multimedia  capability  is  what  has  currently  been  determined  the  best  way  to  tie  the  Archimedes  output 
to  the  process  model.  The  simulation  has  the  ability  to  run  the  video  files  within  the  simulation  to  visually 
demonstrate  the  assembly  procedures  at  various  stages  of  the  process  simulation.  Combining  these  two 
visual  elements  allows  the  user  a  unique  perspective  and  an  opportunity  to  modify  assembly  procedures  as  a 
result  of  process  imposed  constraints.  The  iteration  of  assembly  process  to  manufacturing  process  can 
continue  until  a  reasonable  amount  of  satisfaction  has  been  achieved. 


MATERIAL  FLOW  SOFTWARE 

Assembly  sequence  and  process  simulation  have  now  provided  a  solution  to  the  majority  of  issues 
surrounding  successful  manufacturing  planning  and  implementation.  The  few  that  remain  involve  the 
actual  placement  of  machines,  personnel,  walkways  and  material  handling  routes  and  storage.  These  can  be 
accomplished  with  material  flow  software. 

This  software  requires  the  process  simulation  information,  physical  size,  quantify,  and  space  required  to 
access  the  machine  or  workstations,  the  physical  layout  of  the  facility  with  doors,  walls,  and  any  other 
physical  constraints  of  the  building  structure.  The  software  then  searches  for  the  optimal  placement  of 
machines  based  on  the  process  flow  information  attempting  to  arrange  major  and  minor  flow  of  production 
in  a  manner  that  minimizes  congestion  of  material  handling  and  crossover  of  personnel  movements.  This 
software  is  not  limited  to  inside  a  facility  but  can  be  used  for  multi-building  configuration  where  material 
handling  must  be  performed  between  the  buildings. 

This  kind  of  analysis  can  give  insight  into  alternative  approaches  to  assembly  and  process  optimization 
particularly  where  shared  resources  and  limited  production  spaces  are  involved.  Once  again  an  iterative 
approach  is  used  in  order  to  optimize  design  to  process  simulation  to  material  flow. 


APPLICATION 

A  U.S.  Army  missile  wing  and  probe  assembly  is  being  analyzed  to  determine  the  optimal  assembly 
sequence,  the  number  of  operators  required  to  ramp  up  production,  and  the  optimal  allocation  of  labor,  to 
affect  an  overall  reduction  in  process  lead  times. 

The  current  process  for  the  wing  and  probe  assembly  proposes  that  all  assemblies  be  performed  by  one 
operator  at  an  assembly  workstation.  The  wing  and  probe  are  both  made  separately  at  the  same  workstation 
and  are  not  assembled  together.  The  current  workflow  consists  of  five  first  order  assemblies  preceding  a 
series  of  additional  assemblies  and  seven  oven  cures.  The  current  production  facility  contains  four 
assembly  workstations  with  one  operator  to  man  each  workstation.  There  is  an  additional  employee  who 
works  as  a  material  handler  and  is  responsible  for  ensuring  that  the  operators  are  supplied  with  assembly 
kits,  placing  subassemblies  in  the  oven,  keeping  the  oven  login  sheet  updated,  removing  cured  assemblies 
from  the  oven  and  distributing  them  back  to  the  operator.  There  is  currently  one  oven  with  the  capacity  to 
handle  the  seven  oven  curing  processes.  According  to  the  contractor,  there  is  additional  space  for  three 
additional  ovens  and  six  additional  workstations  each  to  be  manned  by  one  operator.  These  additional 
workstations  will  be  utilized  when  production  ramps  up  to  higher  volumes. 

A  baseline  simulation  model  of  the  current  process  was  set  up  using  WITNESS  simulation  software.  To 
determine  the  most  effective  way  of  utilizing  the  current  process,  a  series  of  variations  was  made  to  the 
baseline  model.  The  variations  included  changing  shift  schedules,  increasing  number  of  workstations  and 
operators,  changing  which  subassembly  was  performed  first,  and  incorporating  a  learning  curve  into  the 
process.  The  variations  did  not  show  a  significant  difference  in  throughput,  however  starting  subassemblies 
according  to  the  longest  remaining  assembly  time  showed  a  significant  decrease  in  work  in  process.  All 
variations  to  the  model  showed  that  the  work  in  process  was  at  least  twice  the  throughput.  The  work  in 
process  is  attributed  to  the  fact  that  operators  continue  to  start  new  subassemblies  during  the  oven  cures,  the 
oven  cures  are  so  long  that  the  operator  has  time  to  start  numerous  new  subassemblies. 


351 


From  the  baseline  model  results  it  was  also  determined  that  the  current  process  could  not  meet  future 
production  requirements.  If  the  additional  operators,  workstations,  and  ovens  are  utilized  during  full  rate 
production  the  process  can  only  produce  approximately  30%  of  demand.  To  increase  throughput  and  meet 
demand,  product  will  have  to  be  produced  ahead  of  schedule  or  process  improvements  will  have  to  be 
made. 

To  aid  in  determining  the  optimal  assembly  sequence,  the  use  of  Archimedes  will  be  implemented.  While 
the  current  assembly  sequence  may  be  optimal,  Archimedes' should  conclude  what  the  optimal  sequence  is. 
If  the  optimal  assembly  sequence  is  currently  being  utilized,  alternative  solutions  such  as  changing  the 
process  to  an  assembly  line  verses  an  assembly  cell  will  be  made.  The  process  is  currently  being  simulated 
as  an  assembly  line  to  determine  if  there  is  an  increase  in  throughput  and  a  decrease  in  work  in  process. 
Since  the  operator  will  work  on  a  smaller  number  of  subassemblies,  he  should  become  more  efficient  and 
thus  improve  the  overall  process.  Preliminary  analysis  has  shown  that  one  of  the  assembly  line  ideas  could 
reduce  the  number  of  operators  from  4  to  3  and  increase  throughput  by  26%.  A  plan  has  been  made  to 
maintain  the  fourth  operator  in  hopes  to  gain  a  greater  throughput  increase. 

A  material  flow  analysis  has  not  yet  been  performed.  This  analysis  will  be  done  following  the  assembly 
line  model  completion. 


CONCLUSIONS 

The  contractor  producing  the  wing  and  probe  for  the  Army  could  not  provide  drawings  needed  for  Archimedes 
assembly  order  analysis.  The  drawings  are  in  the  process  of  being  converted  to  the  solid  model  representations 
needed  for  this  purpose.  There  are  approximately  84  parts  that  will  need  to  be  redrawn.  Varying  sequence  of 
assembly  of  the  wings  and  probes  are  possible.  However,  only  one  process  has  been  traditionally  used.  Once  the 
drawings  are  available  to  Archimedes,  alternative  assembly  orders  can  be  experimented  with.  Parallel  assembly 
processes  are  also  possible  allowing  for  a  greater  savings  in  overall  process  time  reduction  that  has  been  predicted. 
The  process  modeling  has  been  able  to  demonstrate  a  30%  increase  in  throughput  with  one  less  labor  allocation. 
The  assembly  line  approach  versus  a  single  person  assembly  work  cell  is  responsible  for  this  increase  in 
throughput.  In  addition,  the  process  modeling  has  provided  insight  into  the  scheduling  of  new  kit  starts,  thereby 
decreasing  the  amount  of  work  in  process.  The  high  amount  of  work  in  process  can  have  a  detrimental  effect  on 
the  efficiency  of  the  process  both  in  quality  of  workmanship  that  goes  into  each  assembly  and  the  time  needed  to 
accurately  maintain  a  higher  than  necessary  number  of  partially  completed  assemblies.  Hard  benefits  such  as 
increased  throughput  and  soft  benefits  such  as  the  effects  of  substantially  high  work  in  process  and  the  number  of 
operations  an  operator  performs  are  all  benefits  that  come  from  the  use  of  a  new  application  of  artificial 
intelligence  technology.  The  concept  of  performing  and  verifying  assembly  processes  prior  to  developing  process 
simulation  and  then  followed  by  material  flow  software  is  considered  to  be  a  sound  one.  Additional  amount  of 
research  and  work  will  need  to  be  performed  to  integrate  these  tools  at  a  higher  level.  The  currently  used 
standards  do  not  allow  a  smooth  flow  between  software  packages  and  the  iterations  that  must  occur  between  them 
can  be  time  consuming.  With  this  in  mind  this  type  of  analysis  and  verification  has  not  been  possible  until  now. 
The  next  step  will  be  the  integration  of  design  to  assembly  and  the  material  flow  to  near  real-time  data 
visualization  and  control.  Thereby  creating  a  completely  virtual  environment  that  can  then  become  reality  with  a 
high  degree  of  confidence  in  a  products  ability  to  be  designed,  built  and  perform  as  intended. 


352 


Prediction  of  Materials  Properties 


354 


355 


How  Ab  initio  Computer  Simulation  can  Predict 
Materials  Properties  before  Experiment 

Yoshiyuki  Kawazoe 

Institute  for  Materials  Research  (IMR),  Tohoku  University, 
Sendai,  980-8577,  Japan 
Email:  Kawazoe@imr.edu 


ABSTRACT 

Ab  initio  simulation  is  now  possible  to  predict  materials  properties  without  experimental  parameters.  To 
this  aim,  it  is  important  to  avoid  any  parameters  which  depend  on  experiments  in  calculation.  As  a 
fundamentally  new  all  electron  formulation,  mixed-basis  approach  is  introduced  which  is  completely  free 
from  experimental  data  and  several  typical  examples  of  numerical  simulation  using  it  are  shown.  It  is 
shown  that  ab  initio  simulation  to  be  effectively  applied  to  real  materials, hierarchical  approaches  are 
fundamentally  necessary. 

INTRODUCTION 

To  achieve  fundamental  progress  in  industrial  development,  new  materials  arealways  the  basis  to  realize 
such  progress.  However,  most  simple  binary  and  ternary  alloys  have  already  been  studied  and  it  becomes 
more  and  more  difficult,  time  consuming,  and  costly  to  create  useful  new  materials  only  by  experimental 
studies.  To  overcome  this  difficulty,  it  has  long  been  a  dream  to  predict  material  properties  theoretically 
without  experimentation.  Because  of  the  rapid  progress  in  supercomputer  power,  it  is  now  possible  to 
determine  physical  and  chemical  properties  of  materials  by ab  initio  simulation.  Material  properties  can  be 
estimated  theoretically,  since  fundamentally  the  system  in  which  we  are  interested  contains  atoms 
consisting  of  nucleus  and  electrons  that  interact  with  Coulomb  force  and  obey  the  quantum  mechanical 
Schrodinger  equation.  The  only  difficulty  to  solve  the  equation  lies  in  the  large  number  of  atoms(1023)  to 
be  treated  and  the  many-body  interactions.!  1]  (From  ancient  times,  the  three-body  problem  has  been  a 
synonym  for  difficult  problems.) 

However,  to  study  the  vast  variety  of  mechanical,  electrical,  and  magnetic  properties,  it  is  only  necessary  to 
solve  the  equation  for  the  ground  state.  The  local  density  approximation  (LDA)  is  a  practical  method  to 
eliminate  the  many-body  interactions  and  is  known  to  be  able  to  reproduce  the  ground  state  properties 
correctly.  To  avoid  time  consuming  calculation,  normally  pseudo-potentials  are  used  with  plane  wave 
expansion  of  the  system  wave  function.  Although  this  method  has  been  used  widely,  it  contains  arbitrary 
parameters  (how  to  construct  pseud-opotentials,  etc.).  We  have  developed  one  of  the  most  accurate 
formulations  based  on  LDA  adopting  all-electron  full-potential  mixed-basis  wave  functions[2,  3]  The 
computer  code  based  on  this  formulation  can  predict  the  structure  of  materials  (Complete  structure 
optimization  is  only  possible  by  our  method,  and  it  is  impossible  by  any  methodfchat  assume  "muffin  tinsf) 
and  a  wide  range  of  material  properties.  Some  typical  numerical  results  obtained  by  the  program  onthe 
dynamic  behavior  of  clusters,  surfaces,  and  bulk  materials  will  be  introduced  to  indicate  that  new  materials 
that  have  properties  required  by  industry  can  actually  be  predicted  by  computer  simulation. 

It  is  not  possible  to  analyze  interesting  experimental  data  simply  by  applying  ab  initio  calculations,  since 
real  materials  are  very  complex  and  this  complexity  can  not  be  handled  even  by  presentiay  highest-speed 
and  largest-memory  supercomputing  systems.  To  overcome  this  difficulty,  several  methods  have  been 
developed  and  used.  The  first  one  is  to  use  simple  classical  molecular  dynamics,  which  extracts  potential 
parameters  from  the  ab  initio  results.  Although  this  method  is  powerful  for  simulating  large  size  systems,  it 
can  only  be  applied  under  conditions  in  which  the  potential  parameters  are  determined  [4]  The  next 
technique  is  the  tight  binding  (TB)  method,  which  also  fits  TB  parameters  to  theab  initio  results.  [5]  This 
method  is  far  better  than  classical-molecular  dynamics,  since  it  recalculates  electronic  charge  density  for 
different  structures.  An  example  of  TB  simulation  isshown  in  the  next  section  for  carbon  nano-tubes.  [6] 


356 


The  third  method  is  cluster-variation,  which  can  estimate  free  energy  and  has  beenused  in  material  science 
for  a  long  time,  starting  with  empiricalparameterization.  At  present,  with  the  cluster  variation  method,  ab 
initio  total  energy  results  are  used.  [7,8]  Lastly  a  new  method  called  the  direct  method  has  been  introduced 
to  treat  phonon  dispersion.  Several  groups  have  developed  this  methodand  we  are  one  of  these.  Below  is 
given  an  example  of  numerical  result  by  using  the  direct  method  [9] 

As  mentioned  above,  LDA  is  a  good  approximation  for  the  ground  state.  To  simulate  material  properties 
related  to  excited  states,  it  is  necessary  to  introduce  better  approximations.  (Basically  all  experiments  are 
performed  including  excited  states;  it  is  only  possible  to  observe  materials  properties  by  exciting  the 
system!)  Therefore,  we  have  added  a  new  computer  code  to  our  all-electron  program  to  be  allow  treatment 
of  excited  states.  The  approximation  is  known  as  GW  (Green  function  +  vertex).  It  was  developed  almost 
half  a  century  ago!  Unfortunatley,  it  has  not  been  able  to  be  computed  numerically,  because  computer 
resources  have  been  limited.  Recently,  rapid  progress  of  computer  power  has  made  it  possible  to  calculate 
realistic  materials  with  the  GW  approximation.  We  have  already  simulated  several  important  physical 
properties,  like  band  gap  in  typical  semiconductors  and  micreclusters.[10] 

This  paper  is  organized  as  follows:  In  the  next  section,  several  examples  of  the  application  of  all-electron 
mixed-basis  code  are  shown  with  the  TB  calculation  and  cluster  variation  method  (CVM).  Concerning  the 
direct  method,  simulated  results  on  structural  phasetransitions  are  shown. 


SIMULATION  RESULTS  BY  ALL  ELECTRON-MIXED  BASIS-APPROACH 

Carbon  nanotube  diode 

Present  day  state-of-the-art  large-scale,  integrated-circuits  (LSI)  technology  uses  the  order  of  0.1pm 
fabrication  sizes  It  is  desirable  to  make  more  and  more  high-density  devices  to  realize  larger  memory  size 
to  store  high-quality  video  andallow  higher  speed  scientific  and  engineering  calculations.  For  these  aims,  it 
is  not  possible  to  only  play  within  the  photo-etching  technology.  To  overcome  this  level  of  density 
fundamentally,  we  propose  a  newnano-scale  device  based  on  the  carbon  nanotube.  [6] 

By  doping  both  positive  and  negative  ions  into  a  nanotube,  it  is  possible  to  make  an  N-P  junction.  Using 
this  junction  as  a  building  block  we  can  fabricate  nano-scale  transistors  and  nano-scale  electric  devices.  In 
the  simulation,  we  have  used  a  zigzag  nanotube  of  radius  l.lk  and  length  of-  50A  with  480  carbon  atoms 
simulated  by  the  tight  binding  model.  A  periodic  boundary  condition  is  applied  to  the  tube  direction.  It  is  a 
semiconductor  having  0.6eV  gap.  By  doping  6  pairs  of  ions,  donor  and  acceptor  bands  appear  and  the  gap 
becomes  very  small  i  O.leV,  and  the  behavior  ofnanotube  is  almost  metallic  The  transport  properties  are 
estimated  by  assuming  thenanotube  as  a  simple  cylinder.  The  I-V  curve  is  calculated  by  usingLandauer’s 
formula,  and  this  confirms  that  the  doped  nanotube  behaves  as  a  nano-scale  diode. 


0.0  fs  20.0  fs  59.0  fs 


Fig.  1.  Ion  insertion  into  a  carbon  nanotube  simulated  by  all-electron  mixed-basis  molecular  dynamics. 

Although  we  have  proposed  the  carbon  nanotube  diode,  at  the  moment,  experimentally  it  is  very  difficult 
to  pick  it  up  to  dope  ions.  Especially,  it  seems  impossible  to  insert  ions  from  the  tube  edge.  The  only 


357 


plausible  way  to  insert  positive  and  negative  ions  into  thenanotube  is  to  strike  them  through  the  wall.  Even 
the  hexagon  seems  too  small  to  pass  ions  through.  We  have  performed  an  all-electron  mixed-basis 
molecular  dynamics  simulation  and  confirmed  that  this  process  is  possibleWhen  the  ion  comes  close  to 
the  nanotube,  the  hexagon  or  pentagon  in  the  wall  of  the  nanotube  opens  and  the  ion  shrinks  because  of 
charge  transfer,  and  so,  finally,  inclusion  is  realized.  The  initial  kinetic  energy  to  achieve  this  process  is 
around  40eV.  Lower  kinetic  energy  is  insufficient  for  an  ion  to  penetrate  into  the  nanotube,  and  higher 
energy  damages  the  wall  and  the  ion  passes  through  the  nanotube  but  then  goes  away.  Figure  1  shows  the 
process  of  ion  insertion  into  the  nanotube.  It  takes  about  40  fsec  for  this  process.  [11] 

Defect  in  bulk  iron 

It  is  an  important  subject  to  estimate  the  effect  of  defects  in  crystals  generally  in  materials  science.  Among 
these,  defects  in  iron  have  been  a  central  theme  in  nuclear  reactor  studies.  We  have  applied  our  all-electron 
formulation  to  analyze  this  phenomenon  An  all-electron  full-potential  mixed-basis  simulation  is  performed 
using  a  bcc  unit  cell  with  an  experimental  nearest  neighbor  lattice  constant  of  2.484.  The  atomic  orbitals 
used  in  this  calculation  are  Is,  2s,  2p,  and  3s  orbitals,  together  with  3,151  PW’s  corresponding  to  178  Ry 
cutoff  energy.  The  number  of  A-points  is  40  inside  the  irreducible  part  of  the  firstBrillouin  zone.  The 
resulting  magnetic  moment  is  2.  1S[ib,  which  is  closer  to  the  experimental  value  than  previous  results.  In  the 
present  calculation,  not  only  do  we  calculate  Hellmann-Feynman  forces  but  also  the  variational  forces  are 
determined  explicitly  in  the  molecular  dynamics  simulation.  Althoughafc  initio  total  energy  calculations 
and  ab  initio  molecular  dynamics  simulation  can  be  applied  to  determine  the  energetics  and  locally  stable 
structures  of  metallic  systems  with  vacancies  and  interstitial  atoms,  it  is  very  difficult  to  estimate  them, 
since  there  is  no  translational  symmetry. 

The  numerical  results  of  our  simulation  areshown  below  for  a  bcc  iron  crystal.  [12]  The  vacancy  formation 
energy  of  the  crystal  is  estimated  as  the  difference  of  total  energies  of  perfect  crystal  and  a  system  with  a 
single  vacancy.  For  the  perfect  systerp  we  put  16  atoms  inside  a  cubic  unit  cell  of  size,  5.73x  5.73  x  5.73 
A,  while  for  the  system  with  a  vacancy  we  put  15  atoms  inside  the  same  unit  cell.  Although  this  system  is 
not  large  enough,  atomic  displacements  surrounding  the  vacancy  are  notso  important  in  comparison  to  the 
interstitial  case.  Comparing  the  total  energies  of  the  15-atom  defect  system  with  that  of  the  16-atom  perfect 
system,  we  obtain  the  vacancy  formation  energy  of  1 .8  eV  by  using  the  LDA  pseudo-potential  approach 
and  1.2±0.2  eV  by  using  the  all-electron  full-potential  mixed-basis  approach  with  the  local  spin  density 
approximation,  which  are  comparable  to  the  reported  experimental  value  of  1.8  eV.  In  the  calculation 
performed  with  the  all-electron  full-potential  mixed-basis  approach,  the  number  ofA-points  in  the 
irreducible  Brillouin  zone  is  set  to  4  and  the  PW  cut-off  energy  is  16.5Ry  (the  number  ofPW's  is  1,419). 
The  present  results  stillhave  room  to  be  modified  by  expanding  the  size  of  the  simulation  cell,  which  needs 
more  computer  resources. 

Cluster  variation  method 

The  standard  ab  initio  LDA  calculation  gives  us  information  on  the  ground  state  total  energy  and  stable 
structure  of  the  materials  with  physicochemical  properties.  It  is  necessary  to  study  thermodynamic 
properties  to  reveal  dynamic  behaviors  of  materials  at  finite  temperature.  To  this  aim, the  cluster  variation 
method  (CVM)  was  used  to  determine  the  phase  diagram  by  estimating  the  free  energy  by  using  the  energy 
parameters  for  clusters  in  complex  systems  estimated  by ab  initio  calculation. 

CVM  can  also  be  applied  to  determine  site  preference  of  ternary  additions  in  alloys.  We  have  studied  the 
site  preferences  of  transition  metals  in  NjAl  and  other  important  alloy  systems  to  analyze  and  predict,  for 
example,  phase  stability  at  low  temperature  for  high  temperature  materials.  [1  Another  application  of 
CVM  is  to  study  the  dynamic  behavior  of  the  interface.  The  Al-AjLi  system  has  been  studied  as  an 
example  [8],  We  have  used  a  supercell  consisting  of  38  fee  cubes  in  the  calculation  includinga  (100) 
interface  boundary  (IPB).  A  part  of  the  obtained  results  is  shown  in  Figire  2.  It  clearly  shows  that  at  the 
IPB  the  Li  concentration  varies  from  one  phase  to  another.  At  400  K  the  thickness  of  this  variation  is  about 
4  lattice  constants  and  is  a  linear  function  of  temperature. 


358 


E 

c 


■g 

5 


Temperature  (K) 


Fig.  2.  (left)  Li  concentration  averaged  over  (100)  planes  parallel  to  the  IPB  as  a  function  of  distance 
(in  units  of  the  lattice  constant)  at  400  K.  The  solid  and  dotted  lines  are  drawn  for  the  eye. 
(right)  Estimated  width  of  the  IPB  as  a  function  of  temperature. 


Direct  method 

Ab  initio  calculations  have  mainly  been  applied  to  estimate  electro-magnetic  properties  in  solids.  On  the 
contrary,  it  has  been  very  difficult  to  analyze  mechanical  properties  based  on  firsprinciples  simulations. 
The  direct  method  was  recently  invented  to  study  phonon  dispersions  and  phase  transitions  incrystalline 
solids  based  on  ab  initio  calculations  [9]  This  approach  is  one  of  the  first  attempts  to  simulate  mechanical 
properties  in  crystals.  The  calculation  of  phonon  frequencies  of  the  crystal  is  one  of  the  fundamental 
elements  required  to  study  phase  stability,  phase  transition,  and  thermodynamics  Another  ab  initio 
calculation,  the  linear  response  method,  has  difficulty  in  determining  phonon  dispersions,  since  the 
dielectric  matrix  must  be  calculated  in  terms  of  electroniceigen-functions  of  perfect  crystal  [13].  In  the 
direct  method,  the  dynamical  matrix  is  constructed  using  cumulant  force  constants  determined  numerically 
by  the  Hellman-Fynman  force  andthis  is  solved  to  obtain  phonon  dispersions. 

We  have  successfully  applied  the  direct  method  to  study  structural  phase  transition  in  cubic  ZrQ  The  force 
constants  are  determined  from  the  Hellman-Fynman  forces  induced  bydisplacement  of  atoms  in  a  2  x  2  x  2 
fee  supercell.  The  calculated  phonon  dispersions  in  Figre  3  show  clearly  a  soft  mode  at  point  X,  which 
corresponds  to  the  experimentally  confirmed  cubic  to  tetragonal  structural  phase  transition. 

CONCLUSIONS 

In  this  paper,  it  is  discussed  and  some  examples  are  indicated  to  show  the  present  possibility  of  predicting 
materials  properties  without  any  arbitrary  and/or  experimental  parameters.  It  isnot  a  dream  already  to 
numerically  solve  the  basic  quantum  mechanical  equations  for  realistic  materials  industrially  interested  in. 


ACKNOWLEDGEMENTS 

The  author  thanks  all  his  collaborators  who  have  contributed  to  this  work.  Especially,  Profs.  K.  Parlinski, 
K.  Ohno,  and  M.  Sluiter,  Drs.  K.  Esfarjani,  Y.  Maruyama,  H.Kamiyama  and  Z-Q.  Li,  and  Mrs.  A.  Farajian 
and  K.  Shiga,  are  the  researchers  who  provided  the  new  concepts  and  numerical  simulations  presented  in 


359 


this  paper.  He  is  also  grateful  to  the  crew  of  the  Information  Science  Group  at  the  Institute  for  Materials 
Research,  Tohoku  University  for  their  continued  support  of  the  HIT AC  S-3800/380  supercomputer  system. 


Fig.  3.  Phonon  dispersion  of  ZrQ>  in  the  cubic  structure  estimated  by  the  direct  method. 


REFERENCES 

1.  Y.  Kawazoe,  K.  Ohno,  and  K.  Esfarjani,  1999.  "Materials  Science  -  Molecular  Dynamics  and  Monte 
Carlo  Simulation",  Springer- Verlag,  (in  press). 

2.  K.  Ohno,  Y.  Maruyama,  K.  Esfarjani,  Y.  Kawazoe,  N.  Sato,  R.  Hatakeyama,  T.  Hirata,  M. 
Niwano,1996.  Phys.  Rev.  Lett.  76,  3590. 

3.  Y.  Maruyama,  K.  Ohno,  Y.  Kawazoe,  2000.  to  be  published. 

4.  T.  Aihara  and  Y.  Kawazoe,  1997.  Prog.  Theor.  Phys.  S126,  355. 

5.  K.  Esfarjani,  Y.  Hashi,  J.  Onoe,  K.  Takeuchi,  Y.  Kawazoe,  1998.  Phs.  Rev.  B  57, 223. 

6.  K.  Esfarjani,  Y.  Hashi,  A.  A.  Farajian,  Y.  Kawazoe,  1997.  Proc.  of  IPMM97, 171. 

7.  M.  Sluiter  Y.  Kawazoe,  1996.  Phys.  Rev.  B  54,  10381. 

8.  M.  Sluiter,  M.  Asta,  Y.  Kawazoe,  1996.  Sci.  Rep.  RITU  A  41,  97. 

9.  K.  Parlinski,  Z-Q.  Li,  Y.  Kawazoe,  1997.  Phys.  Rev.  Lett.  78, 4063. 

10.  M.  Ishii,  K.  Ohno,  Y.  Kawazoe,  2000.  to  be  published. 

11.  A.  A.  Farajian,  K.  Ohno,  K.  Esfarjani,  Y.  Maruyama,  Y.  Kawazoe,  2000.  to  be  published. 

12.  K.  Ohno,  Y.  Maruyama,  H.  Kamiyama,  E.  Bei,  K.  Shiga,  Z.-Q.  Li,  K.  Esfarjani,  Y.  Kawazoe,  1999. 
in  'Mesoscopic  Dynamics  of  Fracture:  Computational  Materials  Design'  edited  by  HKitagawa,  T. 
Aihara,  Jr.,  and  Y.  Kawazoe,  Vol.  1  of  Advances  in  Materials  Research  Springer- Verlag. 

13.  R.  D.  King-Smith,  R.  J.,  1990.  Needs,  J.  Phys.:  Condensed  Matter  2,  3431. 


360 


361 


Data  Driven  Knowledge  Extraction  of 
Materials  Properties 

J.S.  Kandola*,  S.R.  Gunn*,  I.  Sinclair**,  P.A.S.  Reed** 

ISIS  Research  Group,  Department  of  Electronics  and  Computer  Science, 
**  Engineering  Materials  Research  Group, 

School  of  Engineering  Sciences,  University  of  Southampton,  U.K. 


ABSTRACT 

In  this  paper  the  problem  of  modelling  a  large  commercial  materials  dataset  using  advanced  adaptive 
numeric  methods  is  described.  The  various  approaches  are  outlined,  emphasising  their  characteristics  with 
respect  to  generalisation,  performance  and  transparency.  A  highly  novel  Support  Vector  Machine  (SVM) 
approach  is  taken  incorporating  a  high  degree  of  transparency  via  a  full  ANalysis  Of  VAriance  (ANOVA) 
expansion.  Using  an  example  which  predicts  0.2%  proof  stress  from  a  set  of  materials  features,  different 
modelling  techniques  are  compared  by  benchmarking  against  independent  test  data. 


INTRODUCTION 

The  development  of  empirical  models  is  fundamental  to  understanding  complex  materials  properties  within 
the  field  of  materials  science  [1,2],  Models  may  then  be  used  to  discern  the  physical  relationships  that  exist 
and  enable  optimisation  of  materials  production.  Empirical  modelling  is  the  extraction  of  system 
relationships  from  observational  data,  to  produce  a  model  of  the  system,  from  which  it  is  possible  to  predict 
responses  of  that  system.  Ultimately  the  quantity  and  quality  of  the  observations  govern  the  performance  of 
the  empirical  model.  Often  only  partial  knowledge  is  available  about  the  physical  processes  involved, 
although  significant  amounts  of  law’  data  may  be  accessible  from  production  and  product  release  records, 
which  may  then  be  used  to  construct  a  data-driven  model. 

The  empirical  study  of  materials  phenomena  through  statistical  models  has  a  number  of  limiting 
characteristics.  Consider  a  dataset  DN  =  {jc„  }*,=/,  drawn  from  an  unknown  probability  distribution,  F, 
where  jc,  represents  a  set  of  inputs  (e.g.  alloy  composition  and  thermomechanical  processing  information), 
y,  represents  a  set  of  outputs,  (e.g.  mechanical  properties)  and  N  represents  the  number  of  data-points.  The 
empirical  modelling  problem  is  to  find  any  underlying  mapping  x  —>y  consistent  with  the  dataset  D.  Due  to 
their  observational  nature,  the  data  obtained  are  finite.  Typically,  this  sampling  is  non-uniform  and,  due  to 
the  high  dimensional  nature  of  the  problems  of  interest  (i.e.,  large  numbers  of  inputs),  the  data  only  forms  a 
sparse  distribution  in  input  space.  Consequently  the  problem  is  nearly  always  ill-posed  in  the  sense  of 
Hadamard  [3].  To  address  this  ill-posed  nature,  it  is  necessary  to  convert  the  problem  to  one  that  is  well- 
posed.  To  be  well-posed,  a  unique  solution  must  exist  that  varies  continuously  with  the  data.  We  consider 
various  modelling  approaches  intended  to  transform  the  problem  to  one  that  is  well-posed.  A  further 
limitation  of  any  empirical  modelling  technique  is  its  ability  to  resolve  the  problem  of  highly  correlated 
inputs;  if  two  inputs  are  highly  correlated  it  is  difficult  to  identify  individual  effects  on  the  output. 

The  work  presented  in  this  paper  compares  and  contrasts  common  empirical  models,  and  state  of  the  art 
approaches,  on  the  basis  of  their  generalisation  ability  and  transparency.  This  paper  advocates  a  transparent 
approach  to  the  modelling  problem,  which  enables  understanding  of  the  underlying  relationships  between 
inputs  and  outputs.  This  knowledge  can  then  be  used  to  enhance  model  validation  through  comparison  with 
prior  physical  knowledge.  Generalisation  performance  is  the  assessment  of  model  predictions  to  new  and 
unseen  data.  Traditional  empirical  modelling  approaches  may  suffer  in  terms  of  generalisation,  producing 
models  that  can  overfit  the  data.  Typically,  this  is  a  consequence  of  the  model  selection  procedure  which 
controls  the  complexity  of  the  model.  For  a  given  learning  task,  with  a  finite  amount  of  training  data,  the 
best  generalisation  performance  will  be  achieved  if  the  "capacity"  of  the  model  is  matched  to  the 
complexity  of  the  underlying  process. 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


362 


THE  MATERIALS  DATASET 

In  this  paper  we  consider  an  extensive  commercial  dataset  for  Aluminium  alloy  2024  in  a  T351  temper, 
with  the  objective  being  to  predict  0.2%  proof  stress.  The  "raw"  dataset  consists  of  35  input  variables  and 
2870  data  pairs  covering  various  compositional  and  thermomechanical  processing  parameters,  as  well  as 
containing  "shop  floor"  information  such  as  plate  numbers  and  date  of  alloy  manufacture. 

For  a  physically  amenable  model  to  be  constructed,  the  original  data  set  was  decomposed  into  a  smaller 
subset  based  on  a  single  tensile  direction  (LT),  thickness  position  (C),  and  a  width  position  (0.5).  All  of  the 
major  alloying  elements  and  the  major  impurities  were  retained  as  inputs  to  the  model,  however  the  minor 
compositional  information  was  removed.  The  "shop  floor"  information  was  also  removed  since  it  was  not 
expected  to  contribute  directly  to  proof  stress,  but  does  provide  a  valuable  check  for  changes  in  processing 
methods,  equipment  etc.  Assessment  of  the  slab  dimensional  information  revealed  the  majority  of  the  slab 
width  and  the  slab  gauges  to  be  fixed;  as  a  consequence  the  dataset  which  was  used  for  modelling 
contained  information  for  a  single  slab  width/gauge  combination.  The  initial  scalped  slab  gauges  on 
inspection  were  found  to  be  equal,  and  as  such  the  total  reduction  of  each  plate  is  entirely  defined  by  the 
final  gauge.  The  hot-rolled  width  and  length  were  used  to  define  reduction  in  the  longitudinal  and 
transverse  directions;  hence  a  "reduction-ratio"  was  computed  as  the  ratio  of  engineering  strain  in  the  long 
and  transverse  directions  between  the  slab  and  the  final  plate.  This  stage  of  data  pre-processing  left  a 
reduced  size  dataset:  the  input  variables  comprised  ten  characteristics;  the  final  gauge  (FG),  Cu,  Fe,  Mg, 
Mn,  Si  (in  weight  percent),  slab  length  (SL),  solution  treatment  time  (STT),  percentage  stretch  (%st.),  and 
reduction  ratio  (RR).  After  removing  the  entries  with  missing  and  repeated  values,  290  data  points 
remained.  Before  any  of  the  modelling  techniques  were  used  to  predict  the  proof  stress,  the  dataset  was 
normalised  to  have  a  mean  of  zero  and  unit  variance. 


MODELLING  TECHNIQUES 

This  section  considers  the  adaptive  numeric  methods  used  to  predict  proof  stress  based  on  a  dataset 
described  in  the  previous  section.  Three  techniques  were  considered:  (i)  Multivariate  Linear  model,  (ii) 
Bayesian  multi-layer  perceptron,  (iii)  Support  Vector  Machine.  Data  structure  was  also  examined  using  a 
graphical  Gaussian  model.  Each  of  these  models  (except  the  graphical  Gaussian  model)  are  assessed 
against  each  other  quantitatively  using  the  MSE  test  statistic,  and  qualitatively  in  terms  of  transparency. 

Graphical  Gaussian  Models 

As  the  dimensionality  of  the  problem  domain  increases  graphical  models  and  graphical  representations  are 
playing  an  increasingly  important  role  in  statistics,  and  empirical  modelling  in  particular.  Relationships 
between  variables  in  a  model  can  be  represented  graphically  by  edges  in  a  graph  where  the  nodes  represent 
the  data  variables.  Such  graphs  provide  qualitative  representations  of  the  conditional  independence 
structure  of  the  model,  as  well  as  simplifying  inference  in  highly  structured  stochastic  systems. 

Let  X  be  a  ^-dimensional  vector  of  random  variables.  A  conditional  independence  graph  [4],  G  =  (V,E) 
describes  the  association  structure  of  Aby  means  of  a  graph,  specified  by  the  vertex  set  V,  and  the  edge 
set  E.  Conditional  independence  is  an  attractive  method  to  generalise  the  relation  between  two  variables.  A 
graphical  model  is  then  a  family  of  probability  distributions  Pc  that  is  a  Markov  distribution  over  G.  A 
graphical  Gaussian  model  is  obtained  when  only  continuous  random  variables  are  considered.  If  we  can 
assume  that  the  data  has  been  drawn  from  a  Gaussian  distribution,  then  there  is  no  loss  of  information  by 
condensing  the  data  into  the  sample  mean  vector,  and  the  sample  variance-covariance  matrix.  A  symmetric 
correlation  coefficient  matrix  can  then  be  obtained  from  this  matrix.  To  construct  the  graphical  model  it  is 
necessary  to  test  for  the  presence  or  otherwise  of  dependencies  between  the  variables.  Using  a  scaled 
inverse  correlation  matrix,  a  second  deviance  matrix  can  be  computed  using  equation  1 ,  where  Xa  and  Xb 
represent  the  variables  against  for  which  conditional  independence  is  being  tested  for  given  the  other 
variables  in  the  dataset  Xc.  This  test  statistic  has  an  asymptotic  chi-squared  distribution  with  one  degree  of 
freedom. 


dev(  XbjXc|Xa)  =  -N  ln(l  -  corrjj  (Xb,Xc\Xa  )) 


1. 


363 


This  second  matrix,  the  deviance  matrix,  measures  the  overall  goodness  of  fit  of  a  graphical  model  by 
carrying  out  a  hypothesis  at  a  95%  confidence  interval  of  the  chi-squared  distribution.  Figure  1  illustrates 
the  graphical  model  obtained  for  the  materials  dataset. 


Fig.  1.  Graphical  Gaussian  model  for  the  materials  dataset. 

This  figure  provides  a  powerful  tool  to  visualise  the  complex  interactions  between  different  data  variables. 
For  a  10  input  dataset,  a  total  of  45  relations  are  possible,  however  approximately  50%  of  the  relations  are 
deemed  to  be  significant  at  the  95%  confidence  level.  The  graphical  model  suggests  that  proof  stress  (PS) 
could  only  be  directly  predicted  from  final  gauge  (FG),  solution  treatment  time  (STT)  and  percentage 
stretch  (%st.)  since  these  are  the  only  variables  with  which  proof  stress  has  direct  links.  However,  the 
representation  shows  that  there  is  a  large  complex  hierarchy  between  the  different  data  variables.  The  links 
which  are  established  could  represent  true  physical  dependencies  between  the  different  variables,  as  well  as 
representing  manufacturing  artefacts  (for  example  the  use  of  master  alloys  in  altering  composition). 

The  Multivariate  Linear  Model 

A  multivariate  linear  model  is  given  by  equation  2: 

y  =  wQ  +wjx  +w2X2  +"'+wnxn  2- 

where  xy, . . . ,  x„  is  the  models  input  vector,  w,, . .  ,,w„  are  unknown  parameters  to  be  estimated,  and  w„  is  a 
bias  term.  The  unknown  vector  of  parameters,  w,  can  be  estimated  together  with  the  associated  parametric 
uncertainty  in  the  standard  least  squares  sense  [5], 

Low  parametric  uncertainty  standard  deviation  values,  relative  to  the  parameter  values,  are  desirable  since 
they  imply  more  confidence  in  these  parameter  values,  and  hence  more  significance  in  the  inputs.  The 
parameter  values  and  their  associated  uncertainty  are  given  in  Table  1 . 


364 


Table  1:  Weight  gains  and  parametric  uncertainty  values  for  the  materials  dataset. 


Bias 

FG 

Cu 

Fe 

IZBI 

Mn 

Si 

SL 

STT 

%st. 

RR 

360.0 

-18.1 

5.30 

4.50 

-0.58 

4.86 

-8.01 

-0.41 

24.4 

21.5 

-2.1 

i 0 

4.85 

3.45 

5.05 

3.39 

6.12 

3.80 

3.64 

3.37 

8.20 

3.98 

5.68 

Since  the  data  are  normalised,  the  size  of  the  weight  gains  can  be  directly  interpreted  to  show  the  first  order 
importance  of  these  variables  in  affecting  the  output.  The  bias  term,  the  final  gauge,  solution  treatment  time 
and  percentage  stretch  show  the  biggest  values  with  lowest  uncertainty.  Selection  of  these  variables  is 
consistent  with  the  graphical  model  dependencies  and  the  expected  physical  behaviour.  However,  it  must 
be  borne  in  mind  that  the  graphical  model  suggests  that  a  complex  system  of  interdependencies  exist  in  this 
dataset.  The  MSE  obtained  for  the  linear  model  was  144.13(MPa2),  which  represents  a  tolerance  of  12MPa 
on  the  proof  stress  (one  standard  deviation  in  the  errors  between  target  and  predictions). 


Bayesian  Multi-layer  Perceptron 

A  Bayesian  multi-layer  perceptron  (MLP)  encompasses  all  of  the  key  features  of  the  classical  MLP,  but 
differs  in  that  network  training  takes  place  using  Bayesian  learning  [6,7],  The  result  of  Bayesian  learning  is 
a  probability  distribution  over  model  parameters  that  expresses  our  degrees  of  belief  regarding  how  likely 
the  different  parameter  values  are.  Initially  a  wide  prior  distribution,  is  defined  which  might  express  some 
rather  general  properties  such  as  smoothness  of  the  network  function,  but  will  otherwise  leave  the  weight 
values  fairly  unconstrained.  Upon  observation  of  the  data,  this  wide  prior  distribution  is  converted  to  a 
narrower  posterior  distribution  by  using  Bayes’  theorem.  This  illustrates  the  fact  that  we  have  learned 
something  about  the  extent  to  which  different  weight  values  are  consistent  with  the  observed  data. 


Following  the  work  of  MacKay  [8]  a  Gaussian  prior  was  chosen  for  the  initial  weight  values  (including 
bias),  corresponding  to  the  use  of  weight  decay  regularisation  which  controls  the  network  capacity. 
Bayesian  learning  in  an  MLP  simplifies  to  finding  the  weight  vector,  w,  which  minimises  the  cost  function, 


R  N  n  n  i  a  W  2 
E(w)  =  —  X  { y(x  ;w)-t  }  +  —  X  w- 
2  «=1  2  «=1 


This  cost  function  was  minimised  using  the  scaled  conjugate  gradient  algorithm  [6]  while  a  and  P  were 
continuously  re-estimated  using  the  evidence  framework.  Automatic  Relevance  Determination  (ARD)  [8] 
was  also  used  as  a  form  of  input  selection.  A  single  hidden  layer  with  varying  numbers  of  nodes  was  used 
to  model  the  relationship  between  inputs  and  outputs.  For  assessment,  cross-validation  was  used.  A  plot  of 
the  MSE  for  both  training  and  test  data  sets  is  shown  in  Figure  2  for  increasing  numbers  of  hidden  nodes. 


Fig.  2.  Varying  MSE  (training  data test  data '-')  for  increasing  numbers  of  hidden  nodes. 


365 


An  increase  in  the  number  of  hidden  nodes  corresponds  to  an  increase  in  complexity.  The  initial  MSE 
values  for  the  test  set  show  a  consistent  decrease  as  the  number  of  hidden  nodes  increases  up  to  a  maximum 
of  7  hidden  nodes,  after  which  the  MSE  increases  suggesting  that  overfitting  of  the  data  has  started  to 
occur.  The  MSE  for  the  training  set  shows  a  decreasing  error  for  increasing  hidden  nodes,  a  manifestation 
of  the  fact  that  increasing  complexity  corresponds  to  an  increase  in  regularisation.  The  training  data  showed 
a  MSE  of  65.0  whilst  the  test  error  showed  a  MSE  of  84.2  corresponding  to  an  effective  standard  deviation 
of  8MPa  on  the  training  data  and  9MPa  on  the  test  data. 

Support  Vector  Machines  (SVMs) 

SVMs  have  recently  received  intensified  research  effort,  due  to  many  attractive  features  and  promising 
empirical  performance.  The  formulation  embodies  the  principle  of  structural  risk  minimisation  (SRM) 
developed  by  Vapnik  [9],  SRM  differs  from  the  common  empirical  risk  minimisation  (ERM),  by  trying  to 
minimise  an  upper  bound  on  the  expected  risk,  rather  than  minimising  the  training  set  error.  If  the  VC 
dimension  is  low,  the  potential  to  overfit  is  low,  enabling  good  generalisation.  SVMs  nonlinearly  transform 
the  original  input  space  into  a  higher  dimensional  feature  space  by  using  reproducing  kernels.  The  only  way 
the  data  appears  in  the  training  problem  is  in  the  form  of  dot  products.  The  use  of  a  kernel  function  enables 
operations  to  be  performed  in  the  input  space  rather  than  the  potentially  high-dimensional  feature  space. 

SVMs,  like  the  Bayesian  MLP,  are  essentially  black  box  models,  however  transparency  can  be  introduced 
by  use  of  the  SUPANOVA  framework  [10].  The  SUPANOVA  promotes  a  sparse  representation  within  an 
ANOVA  (Analysis  Of  VAriance)  representation.  The  ANalysis  Of  VAriance  (ANOVA)  representation 
provides  a  transparent  approach  to  modelling.  It  describes  the  decomposition  of  the  function  into  additive 
components,  with  the  objective  being  to  represent  this  function  by  a  subset  of  terms  from  an  expansion, 

/(*)  =  /o+X//(*/)  +  X  X^(X/’X;)  +  ,"  +  /l,2r-*(X)  7- 

i'=l  (=1  j=l+ 1 

The  solution  to  this  problem  is  represented  by  a  sum  of  univariate,  bivariate  and  trivariate  ANOVA  terms. 
For  this  dataset  1024  different  terms  were  possible,  however  only  20  terms  were  chosen  as  being 
significant.  Figure  3  shows  a  selection  of  the  regression  surfaces  for  the  univariate  and  bivariate  ANOVA 
terms.  The  MSE  for  the  training  set  was  61.6  whilst  the  generalisation  MSE  was  79.8  a  tolerance  of  8.9 
MPa  on  the  proof  stress  values. 


%  Stretch 


Fig.  3.  Univariate  and  Bivariate  plots  for  the  materials  dataset. 

These  regression  surfaces  represent  interaction  terms:  to  be  interpretable  the  terms  which  appear  in  the 
univariate  plots,  must  be  added  across  the  relevant  dimension  should  they  occur  in  the  bivariate  (or  higher 
order)  regression  surfaces.  The  regressions  may  be  compared  with  physical  trends,  with  those  shown  in 
Figure  3  showing  increasing  strength  with  increasing  silicon  concentration  and  percentage  stretch,  but 
decreasing  with  final  gauge. 


DISCUSSION 

This  paper  has  described  the  modelling  of  proof  stress  using  advanced  adaptive  numeric  methods.  In  the 
context  of  this  example  the  key  empirical  modelling  themes  of  model  validation,  model  transparency  and 
model  generalisation  were  illustrated.  Table  2  shows  the  MSE  values  for  each  of  the  quantitative  modelling 
techniques  used. 


366 


Table  2.  MSE  values  for  the  empirical  models. 


Linear 

Bayesian  MLP 

SVM 

MSE  Train 

94.1 

65.0 

61.4 

MSE  Test 

145.0 

84.2 

80.8 

The  graphical  model,  through  its  highly  transparent  nature,  shows  that  a  complex  hierarchy  of  interactions 
are  prevalent  in  this  dataset,  as  such  the  separation  of  a  particular  variables  influence  on  the  output  is  made 
difficult.  The  linear  model  provided  a  benchmark  against  which  all  of  the  other  techniques  were  compared. 
As  a  consequence  of  its  simple  inflexible  nature,  and  inability  to  adapt  to  more  complex  scenarios,  it 
exhibited  the  worst  performance  in  terms  of  MSE.  However  it  does  provide  an  indication  of  variable 
influence  through  its  transparent  nature.  The  Bayesian  MLP  showed  better  performance  over  the  simple 
linear  model,  a  consequence  of  its  advanced  features  such  as  incorporation  of  input  selection  via  ARD  and 
ability  to  prevent  overfitting  through  regularisation.  A  limitation  of  the  Bayesian  MLP  is  that  it  lacks 
transparency,  the  parameters  in  the  network  are  not  directly  interpretable  in  the  same  way  as  the  linear 
model.  Very  local  transparency  can  be  introduced  by  testing  the  model  using  artificial  datasets,  where  an 
input  is  varied  between  its  maximum  and  minimum  values  whilst  setting  the  other  variables  to  be  at  their 
means.  However,  this  process  is  very  limited  in  high  dimensional  problems.  The  support  vector  approach 
showed  the  best  generalisation  ability.  The  SUPANOVA  framework  incorporates  transparency,  and  the 
trends  depicted  in  figure  2  are  consistent  with  expected  theories  of  physical  behaviour.  This  illustrates  that 
a  fully  transparent  modelling  approach  does  not  affect  generalisation  performance,  but  provides  a  means  for 
validating  the  model  constructed.  SVM's  prevent  overfitting  of  the  data  by  zero  order  regularisation  in  the 
nonlinear  feature  space.  A  major  problem  with  any  empirical  modelling  approach  is  the  extent  to  which 
each  of  the  data  variables  are  correlated  with  each  other.  Further  work  will  try  to  assess  the  applicability  of 
data  preprocessing  techniques  to  resolve  this  problem. 


ACKNOWLEDGEMENT 

The  authors  would  like  to  thank  the  EPSRC  and  British  Aluminium  Plate  for  their  financial  support  of  this 

work. 

REFERENCES 

1.  H.  Fujii,  D.J.C.  MacKay,  H.D.K.H  Bhadeshia,  1996.  Bayesian  neural  network  analysis  of  fatigue  crack 
growth  rate  inNi-base  superalloys.  IS1J  International,  36,  1373-1382. 

2.  A.F.  Karr,  1994.  Statistics  and  Materials  Science  -  Report  of  a  Workshop.  National  Institute  of 
Statistical  Sciences. 

3.  J.  Hadamard,  1923.  Lectures  on  the  Cauchy  Problem  in  Linear  Partial  Differential  Equations.  Yale 
University  Press. 

4.  J.  Whittaker,  1990.  Graphical  Gaussian  Models  in  Applied  Multivariate  Statistics.  Wiley  Publishers. 

5.  J.  Neter,  M.H.  Kutner,  C.J.  Nachtsheim,  W.  Wasserman,  1996.  Applied  Linear  Statistical  Models  4* 
Edition.  Irwin  Publishers. 

6.  C.M.  Bishop,  1995.  Neural  Networks  for  Pattern  Recognition.  Oxford  University  Press. 

7.  R.M.  Neal,  1995.  Bayesian  Learning  for  Neural  Networks.  Springer- Verlag  Publishers. 

8.  D.J.C.  MacKay,  1994.  Bayesian  Non-linear  Modelling  for  the  Prediction  Competition.  ASHRAE 
Transactions:  Symposia,  OR-94-17-1,  1053-1062. 

9.  V.  Vapnik,  1995.  The  Nature  of  Statistical  Learning  Theory.  Springer-Verlag  Publishers. 

10.  S.R.  Gunn,  1999.  SUPANOVA  -  A  Sparse  Transparent  Modelling  Approach,  submitted  to  NNSP'99. 
USA. 


367 


A  Quantum  Neural  Net:  with  Applications 
to  Materials  Science 

B.  Igelnik*,  M.  Tabib-Azar*,  Y.-H.  Pao*,  and  S.  R.  LeClair** 

*  Electrical  Engineering  and  Computer  Science  Department 
Case  Western  Reserve  University,  Cleveland,  OH,  USA 
**Material  Directorate,  Wright  Laboratory,  Fairborn,  OH,  USA 


ABSTRACT 

In  this  article  a  new  neural  network  architecture  suitable  for  learning  and  generalization  is  discussed  and 
developed.  The  architecture  is  inspired  and  modeled  after  quantum  electronic  devices  and  circuits  where 
coherent  electronic  wavefiinctions  traveling  through  different  parts  of  the  circuit  are  combined  together  and 
result  in  interferences  at  detection  nodes.  These  wavefunctions,  represented  by  complex  numbers,  are 
implemented  as  complex  weights  in  our  neural  net  architecture  to  efficiently  and  accurately  facilitate 
certain  computations.  Although  similar  to  the  radial  basis  function  (RBF)  net,  our  computational  model 
called  quantum  net  (QN)  has  demonstrated  a  considerable  gain  in  performance  and  efficiency  in  number  of 
applications  compared  to  RBF  net.  Its  better  performance  in  classification  tasks  is  explained  by  the  cross- 
product  terms  in  internal  representation  of  its  basis  functions  introduced  parsimoniously.  These  cross- 
products  are  the  results  of  interferences  naturally  occurring  in  coherent  electronic  systems.  Although  we 
primarily  discuss  the  software  implementation  of  QN  on  Von  Neuman  computers,  its  hardware 
implementation  is  also  briefly  discussed.  A  number  of  examples,  solved  using  QN  and  other  networks,  are 
used  to  illustrate  the  desirable  characteristics  of  QN. 


INTRODUCTION 

We  explore  a  new  computation  method  that  we  call  “quantum  network”  (QN)  that  uses  complex  weights  in 
a  neural  network  lattice  and  can  be  constructed  using  quantum  wires,  dots,  and  other  quantum  electronic 
components.  Using  coherent  electrons  or  other  quantum  particles  to  perform  calculations,  QN  takes 
advantage  of  “interferences”  that  take  place  between  these  particles  traveling  through  different  paths  in  the 
circuit.  These  interferences  are  very  well  known  in  quantum  mechanics  and  are  basis  of  many  physical 
phenomena  such  as  conductivity  of  solids,  tunnelings  in  quantum  layers,  optical  properties  of  clusters,  etc. 
Our  QN  is  the  subset  of  a  more  general  class  of  computational  models  and  techniques  called  quantum 
computers  and  quantum  computing  that  has  received  a  renewed  interest  in  recent  years  [1 )— [8]. 

The  mathematical  model,  considered  here,  is  an  expansion  in  basis  functions,  that  is,  it  connects  the 
multivariate  system  input  x  and  output  y  (which  without  loss  of  generality  can  be  assumed  univariate)  by 
the  equation 

N 

y  =  f N,ET  M  =  X  a"Sn(X’  bn  )  > 
n= I 

where  a„,bn  are  adjustable  real  parameters,  g„  are  called  basis  functions.  The  types  and  the  number  of 
basis  functions  determine  the  architecture  of  the  model,  its  accuracy  and  efficiency.  The  basis  functions 
usually  have  the  same  structure  for  any  n  =1,  N  and  constitute  a  superposition  of  a  simple  multivariate 
function  q>  of  x  and  b„  (internal  function)  and  a  univariate  function  g  (external  function),  as  shown  in  the 
equation 


g„M„)=s(<pUO)- 


In  neural  networks  and  RBF  networks  the  internal  function  cp  is  a  linear  sum  of  univariate  functions 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


368 


d 

<P(*A 

i=l 

where  x  =  (x u..jcd\yi{xi,c„i)=xi-cni  for  neural  networks,  y,(x,-,cm)  =  (x,.  -  cnjf  for  RBF  networks,  and 
wni,cnj  are  adjustable  parameters.  The  QN  architecture  constitutes  a  model  of  the  following  form 

y  =  a o  +  Z  «„#[(*  ~dj  °C„  ■  (F„f  o  (x- </„)],  1 . 

»=1 

where  the  parameters  a„,n  =  0,...N  are  real  numbers,  while  the  parameters  c„,n  =  \,...N  are  complex 
vectors,  c„  means  complex  conjugate  of  cn ,  and  °  stands  for  inner  product  of  two  vectors.  Unlike  the 
neural,  or  RBF  net,  the  internal  representation  in  a  basis  function  is  not  a  weighted  sum  of  univariate 
functions  but  constitutes  a  quadratic  function  of  d  variables  with  cross-product  terms  of  special  form.  This 
form  of  internal  representation,  as  we  show  in  next  two  sections,  has  both  ‘hardware”  and  "software" 
implementations. 

MOTIVATION 

We  were  intrigued  by  the  ability  of  QN  to  solve  some  interesting  problems,  such  as  XOR  problem,  in  a 
very  simple  and  straightforward  manner.  While  the  neural  or  RBF  networks  with  one  nonlinear  basis 
function  cannot  solve  the  XOR  problem,  the  "quantum  net"  can  do  it  with  only  one  node.  As  will  be  shown 
below,  this  ability  is  due  to  the  specific  construction  of  internal  representation  of  a  basis  function  in 
Quantum  net’.  This  fact  suggests  that  in  some  applications  QN  will  work  better  than  neural  or  RBF 
networks,  at  least  for  classification  tasks.  Several  examples  confirm  this  guess.  The  number  of  parameters 
in  a  basis  function  of  a  Quantum ’  net  is  roughly  twice  that  of  similar  neural  or  RBF  networks.  However, 
savings  in  the  number  of  basis  functions  may  exceed  losses  in  the  number  of  parameters.  Even  relatively 
small  decrease  in  the  number  of  basis  functions  can  compensate  increase  in  the  number  of  parameters  since 
computational  time  grows  as  a  quadratic  function  of  the  number  of  basis  functions.  Another  rationale  for 
developing  the  Quantum  net’is  its  universal  approximation  capability. 

To  illustrate  the  concept  of  quantum  computing,  we  examine  the  well-known  double-slit  interference 
experiment  shown  in  Figure  1 . 


Hectare 

<— * 

Souce 


Sawn 


Fig.  1.  Schematic  of  the  double-slit  interference  experiment  to  illustrate  the  concept  of  quantum  computing. 

We  note  that  the  double-slit  apparatus  can  be  used  to  perform  both  OR  and  XOR  functions  depending  on 
the  location  of  the  detector  on  the  screen.  To  perform  the  OR  function  the  light  intensity  on  the  screen  is 
integrated  from  xx  to  x’2 .  Hence,  when  only  one  of  the  slits  is  open  there  is  enough  light  at  the  detector  to 


369 


exceed  the  threshold  for  "1".  If  both  slits  are  open,  the  light  at  the  detector  almost  doubles  and  after 
thresholding,  a  logic  one  is  detected.  In  the  regions  where  the  XOR  function  is  performed,  the  light 
intensity  is  zero  when  both  slits  are  open  (due  to  destructive  interference  between  lights  coming  through 
different  slits)  or  when  both  of  them  are  closed.  The  light  intensity  in  this  region  is  high  when  only  one  of 
the  slits  is  open.  By  choosing  an  appropriate  function  for  thresholding,  this  system  can  be  made  relatively 
immune  to  noise  and  variations  to  slit  size  and  other  parameters. 

Quantum  devices  in  particular  and  “wave-mechanical”  devices  in  general,  can  be  readily  used  to  perform 
the  following  operations:  attenuation  and  amplification,  phase  change,  and  detection.  Mathematically, 
attenuation  and  amplification  of  an  electronic  wavefunction  (y )  amounts  to  multiplication  by  a  real  number 

(v4r\)i)  while  changing  its  phase  ^e'e )  is  accomplished  by  multiplication  of  a  complex  number.  Detection  of 
the  electronic  wavefunction  is  accomplished  by  integration  of  the  square  modulus  of  the  wavefunction  over 
a  region  in  space  corresponding  to  the  detector’s  active  volume  V: 

Probability  of  Detection  =  j^y  (v)*y  (v)<tfv . 

When  y  is  a  slowly-varying  function  of  position  and  for  a  unit  active  volume  of  the  detector,  the 
probability  of  detection  is  approximately  proportional  to  the  square  modulus  of  the  wavefunction: 

Probability  of  Detection  =  «  a|y|2 . 

XOR  PROBLEM 

The  XOR  problem  is  to  find  a  curve  f(x,y)=t  that  separates  the  points  /f((),0),  c(l,l)  from  the  points 
S(l,0)  and  D(0,l)  as  shown  in  Figure  2.  That  means  that  there  exists  a  real  number  t  and  a  model  z  =  f(x,y) 
such  that  the  points  A  and  C  are  on  the  one  side  of  the  curve  f(x,y)  =  t  and  the  points  B  and  D  are  on 
another  side  of  the  curve.  Obviously  we  can  take  t  =  0.  Then  we  can  prove  the  following  propositions  [9] 

Proposition  1. 

Any  net  of  the  form 

/W)=g(yi(*)+y26;))>  2- 

where  g  is  a  monotonic,  fixed  univariate  function,  y,,y2  are  arbitrary  fixed-shape  univariate  functions 
cannot  solve  XOR  problem. 

Proposition  2. 

There  exists  a  net  f  of  the  form  (2)  with  fixed  monotonic  univariate  function  g  and  adaptive-shape 
differentiable  functions  \(/ ,  ,vp  2 ,  formed  from  polynomials,  which  solves  XOR  problem.  Any  such  net  should 
have  at  least  8  parameters. 

Proposition  3. 

There  exists  a  quantum  net  with  one  node  and  4  parameters  which  solves  XOR  problem. 

Proof.  The  equation  of  separating  curve  for  QN  with  one  node  is 

(x-0.5)2  +2(x-0.5X>’  -0.5)cos(92 -0,  )+(>'-  0.5)  =  a2.  3. 

Transform  the  variables  x  and  y  to  new  variables  u  and  v  by  the  turn  of  coordinate  axes  on  the  angle  tc /4 

x-0.5  =  W2/2-vV2/2, 

-  0.5  =  u-Jl  /  2  -  vjl  /  2. 

Substituting  (4)  into  (3),  one  obtains 

2 u1  cos2  — — ^1-  +  2v2  sin2  — — —  -  a2  -  0  , 

2  2 


5. 


370 


which  is  the  equation  of  an  ellipse  with  axes  parallel  to  coordinate  axes.  Therefore,  in  coordinatesx  andj> 
the  equation  (5)  also  constitutes  an  equation  of  ellipse  with  the  angle  between  axes  of  the  ellipse  and 
coordinate  axes  equal  n  /4  .  This  is  shown  in  Figure  2. 


Fig.  2.  Geometric  illustration  of  the  solution  of  the  XOR  problem  by  quantum  net 

Substitution  of  the  coordinates  of  the  points  A,  B,  C,  and  D  in  the  left-hand  side  of  the  equation  (5)  yields 

cos2(02  -0,)-  a2  >0,  sin2(02  -0,)-  a2  <  0  .  6. 

The  simultaneous  inequalities  (6)  are  satisfied,  for  example  if 

a2  =0.5,  O<02  -0,  <  7i  / 4  . 

Therefore,  there  exists  a  quantum  net  with  only  one  basis  function,  which  solves  the  XOR  problem.  This 
net  requires  not  more  than  4  parameters  if  we  make  position  of  the  center  of  the  ellipse  adjustable. 

SOFTWARE  IMPLEMENTATION 

For  practical  consideration  the  quantum  net  should  be  written  in  the  form 


H 

II 

N 

1  d  o  / 

„  d  „  , 

j~C«j) 

n= 1 

L  *  >=> 

j= 1 

where  the  model  is  in  the  coordinate  form,  all  the  parameters  a0,ancnj,6nj,w  are  real,  w  is  the  absolute 
value  and  QnJ  is  the  argument  (phase)  of  a  complex  parameter.  We  assume  that  the  input  variables  xx,..xd 
are  scaled  so  that  0  <  Xj  <  1 .  For  certainty  we  assume  that  the  external  function  g  is  the  Gaussian 

g(the-'2'2. 

The  values  of  the  internal  parameters  are  specified  by  the  following  inequalities 

0  <  cnj  <  1, 0  <0„y  <n  —  A, 7r  +  A  <enj  <  2k  , 

where  A  >  0  is  any  number  small  compared  to  7t  .  We  divide  all  data  that  are  available  for  learning,  into 
two  sets,  the  training  set  Er  and  the  generalization  set  Ea .  The  ensemble  approach  [10],  [11]  is  used  for 
training  and  testing.  It  constitutes  a  further  development  of  ideas  of  the  Functional-Link  net  [12],  The 
training  set  is  used  to  adjust  the  parameters  a0,ancnj,Qnj  on  the  criteria  of  the  minimal  training  error,  while 

the  testing  set  is  used  to  determine  the  number  of  basis  functions  N.  The  parameter  w  can  be  adjusted 
manually.  The  learning  of  the  model  is  made  by  the  ensemble  approach.  The  main  features  of  this  approach 
are  as  follows.  The  algorithm  of  learning  is  sequential.  That  means  that  only  one  node  is  learned  at  a  time. 


371 


The  learning  starts  with  a  simplest  net  of  the  form  (node  0)  y  =  a0  and  then  grows  net  node  by  node.  Next, 
consider  the  best  net  that  can  be  chosen  from  the  ensemble.  This  is  done  by  the  process  of  adaptive 
stochastic  optimization.  The  whole  ensemble  of  K  possible  choices  of  the  parameters  cnj,Gnj  is  divided  in 

M groups  each  having  L  members  so  that  K  =  ML.  In  first  group  the  parameters  cnj,Qnj  are  generated  from 
the  intervals  [o,l]and[o.2ir]-[jt  -A,ji  +a]  uniformly.  After  the  first  group  of  the  parameters  cnj,GnJ  have 
been  chosen,  the  parameters  a0,al,...a„ ,  net  output,  and  the  training  error  have  been  calculated,  the  net  with 
the  minimal  training  error  is  identified.  The  internal  parameters  cnJop,,B„Jopl  of  this  optimal  net  are  kept  in 

memory  and  used  to  correct  the  distribution  of  the  parameters  in  the  groups  2,3 ,..M.  For  these  groups, 
instead  of  the  uniform  distribution,  we  use  the  triangle  distribution. 


UNIVERSAL  APPROXIMATION  CAPABILITY  OF  THE  QUANTUM  NET 

The  universal  approximation  capability  of  the  quantum  net  was  proved  [9]  in  the  following  theorem 
Theorem.  (The  universal  approximation  capability  of  the  quantum  net). 

Suppose  the  external  function  g  satisfy  the  conditions 


>•1-1 


( d  N 

g 

1'='  J 

dt)..dtd  <  °° . 


Define  a  distance  between  a  function  f  defined  and  continuous  on  Id ,  and  a  quantum  net  fN 


7. 


2X  «•*(*, 


8. 


as 

P  (/.  fs )  =  -  h  MP  A  •  -9- 

Thus  for  any  e  >  0  and  any  function  f  defined  and  continuous  on  Id ,  there  exists  a  quantum  net  fN ,  such 
that 


P (/,/*)<£  • 


10. 


EXAMPLES  OF  APPLICATIONS  IN  MATERIALS  SCIENCE 

Example  1. 

The  comparison  between  quantum  net  and  RBF  net  was  made  also  on  real  data  [13].  A  body  of  data 
constitutes  6431  patterns  of  ternary  systems  (systems  of  3  chemical  elements)  with  15  features  of  the 
elements  in  the  system,  5  for  each  element.  For  each  system  it  is  known  if  it  can  or  cannot  form  a 
compound.  This  information  is  available  through  long  and  expensive  experimentation  and  lengthy 
calculations.  The  task  is  to  build  a  neural  net,  which  can  accurately  predict  possible  formation  of  a 
compound  for  a  new  system,  not  available  in  the  database.  The  quantum  net  and  the  RBF  net  were  applied 
to  this  data,  both  using  the  ensemble  approach  for  learning  and  generalization.  With  RBF  net  the  results 
are:  72  (95.5%  correct)  misclassifications  in  a  testing  set  of  1607  patterns,  using  44  nodes,  with  quantum 
net  60  misclassifications  (96.2%  correct)  are  obtained  with  30  nodes  and  49  misclassifications  (97.0% 
correct)  with  70  nodes.  Our  experiments  have  indicated  that  the  activation  function  for  the  quantum  net  can 
be  chosen  from  the  same  set  of  functions  as  for  RBF  net.  In  particular  the  minimum  of  generalization  error 
is  achieved  with  the  “thin  plate”  activation  function  f(t)=t 2  log(/).  These  results  are  shown  in  Figure  3. 


372 


650 


0  10  20  30  40  50  60  70  80 

nodes 

Fig.  3.  Training  error  (upper)  and  testing  error  (lower)  versus  number  of  nodes  for  QN  —  Villars’data. 
Example  2. 

A  quantum  net  and  an  RBF  net  are  compared  in  the  task  of  building  a  cellular  automata  (CA)  based  model 
of  thin  film  growth  [14].  The  atoms  of  types  A  and  B  are  sent  to  the  substrate  by  two  heated  sources.  Those 
atoms  which  bond  between  each  other  or/and  with  the  substrate,  form  a  surface  film.  The  geometric 
features  of  the  surface,  such  as  average  roughness,  are  of  great  importance  to  the  quality  of  the  film. 
Depending  on  the  current  state  of  the  surface  and  substrate,  an  incoming  atom  can  form  different  types  of 
bonding  with  the  surface  or  remain  as  a  vapor.  For  the  current  state  of  the  model,  6  possible  states  of  an 
atom  are  assumed.  These  are  AA  bond,  AB  bond,  adsorbed,  wall-adsorbed,  cliff-adsorbed,  and  vapor.  In 
the  CA  model  it  is  supposed  that  the  actual  state  of  an  atom  depends  not  on  the  entire  substrate  and  surface 
but  only  on  the  states  of  atoms  in  the  neighborhood  of  the  incoming  atom.  The  neighborhood  constitutes  26 
cells  that  together  with  the  incoming  atom  form  a  cube  with  the  incoming  atom  at  the  center.  Surrounding 
cells  are  filled  by  atoms  of  type  A  or  B,  or  they  are  empty.  The  state  of  an  incoming  atom  can  be  predicted 
given  the  state  of  the  neighborhood,  temperature  and  some  probabilities  calculated  by  using  laws  of 
statistical  physics.  It  is  impossible  for  such  a  model  to  operate  in  a  reasonable  time  given  that  calculations 
must  be  carried  out  for  millions  of  atoms.  That  is  why  a  neural  net  is  used.  After  training  on  a  number  of 
known  examples,  the  net  can  predict  the  current  state  of  the  incoming  atom. 

In  the  current  model,  we  used  two  discrete  variables  to  characterize  each  of  the  neighborhood,  the 
temperature,  and  3  probabilities  —  altogether  6  variables  --  as  inputs  to  the  net,  and  one  discrete  output 
taking  6  possible  values.  The  number  of  training  patterns  was  3208,  and  the  number  of  testing  patterns  was 
1069.  Comparison  of  the  RBF  and  quantum  net  is  made  in  terms  of  the  number  of  misclassifications  of  the 
output  and  the  time  required  to  predict  the  state  of  an  incoming  atom.  Results  are  shown  in  Table  1. 


Table  1:  Comparison  between  RBF  and  quantum  nets  in  terms  of  time  and  accuracy. 


Type  of  net 

#  training 
patterns 

#  testing 
patterns 

#  misclassed 
patterns. 

%  correct  test 
patterns 

Time/  test 
pattern 

RBF 

3208 

1069 

70 

93.4 

14  psec 

ON 

3208 

1069 

60 

94.4 

12  psec 

Example  3. 

The  data  set  consisted  of  676  points  describing  dependency  of  the  optical  thickness  of  a  thin  film  (output) 
on  its  spectral  pattern  (input)  [15].  The  input  constituted  a  3 3 -dimensional  vector.  The  output  values  were 


373 


uniformly  distributed  in  the  range  [o.5,5.5]  with  the  average  value  equal  to  3.  Thus,  1%  of  error  corresponds 
to  0.03  or  0.0009  MSE.  Three  quarters  of  the  data  (507  patterns)  were  used  for  training  and  one  quarter 
(169  patterns)  was  used  for  testing.  This  is  an  example  of  learning  a  continuous  function  with  a  large 
number  of  variables.  The  results  of  training  and  testing  for  a  quantum  net  are  shown  in  Figure  4. 


Fig.  4.  Training  MSE  (lower)  and  testing  MSE  (upper)  versus  number  of  nodes  for  quantum  net. 

The  level  of  1%  of  testing  error  was  achieved  with  a  net  of  68  nodes  (0.000895  MSE),  while  the  training 
error  was  0.8%  (0.000597  MSE).  The  best  results  were  obtained  with  a  net  of  170  nodes:  testing  error 
0.37%  (0.000121  MSE),  training  error  0.16%  (0.000031  MSE).  The  corresponding  results  for  RBF  net  of 
the  same  size  were  :  testing  error  0.5%  (0.000225  MSE),  training  error  0.27%  (0.000063  MSE).  These 
examples  confirm  that  the  quantum  net  has  a  visible  advantage  in  accuracy  and  efficiency  of  learning  and 
generalization  compared  with  the  RBF  net.  These  advantages  will  become  even  more  when  the  quantum 
computers  will  make  calculations  with  complex  numbers  much  faster  than  now. 

HARDWARE  IMPLEMENTATION 

To  demonstrate  the  practicality  of  the  hardware  implementation  of  QNs,  we  discuss  a  specific  example 
below  involving  the  XOR  function.  A  QN  capable  of  performing  XOR  is  shown  in  Figure  5  where  an 
Aharanov-Bohm  ring  is  used  with  two  FET  switches.  The  magnetic  field  perpendicular  to  the  ring  is  chosen 
so  that  when  both  the  FETs  are  on  (conducting)  the  current  through  the  ring  is  zero  (i.e.,  destructive 
interference  case)  and  when  either  of  the  FETs  are  closed,  the  current  is  non-zero.  It  should  be  noted  that 
this  structure  is  very  similar  to  the  regular  microwave  waveguide  structure  (with  angstrom  dimensions)  and 
that  the  reflection  paths  should  be  properly  terminated  /matched  to  prevent  unwanted  interferences. 

Only  the  magnetic  field  is  varied  as  the  adjustable  parameter  to  train  this  QN.  The  output  is  monitored 
while  the  magnetic  field  is  varied  to  ensure  zero  output  when  both  FETs  are  on.  A  small  feedback  circuitry 
can  be  devised  to  automate  the  training  (very  much  like  the  Hopfild  net).  This  hardware  implementation  is 
shown  schematically  in  Figure  5. 

CONCLUSION  AND  FUTURE  WORK 

The  new  architecture  of  neural  network,  suggested  in  this  paper,  has  a  solid  motivation  and  has  proved  a 
considerable  and  measurable  advantage  over  the  RBF  net  in  performance  and  efficiency  in  a  number  of 
applications.  Our  future  work  will  be  concentrated  both  on  applications  of  this  architecture,  in  particularly 
in  the  area  of  smart  sensors,  and  on  theoretical  development  of  a  new,  completely  adaptive  architecture. 


Fig.  5.  An  example  of  hardware  implementation  of  QN  using  an  Aharonov-Bohm  ring.  The  ring, 
schematically  shown  here,  is  composed  of  a  high-quality  gold  or  AlGaAs/GaAs  2-D  gas  layer. 
The  gates  are  used  to  raise  or  lower  the  potential  barrier  and  control  the  current  flow  through  a 
given  arm.  The  magnetic  field,  B,  is  perpendicular  to  the  ring’s  plane  and  it  causes  the  phase 
difference  of  20  between  the  electronic  wavefunction  flowing  through  the  two  arms.  The  value  of 
0  depends  on  the  strength  of  the  magnetic  field. 

REFERENCES 

1 .  A.  Garcia  and  M.  Tabib-Azar,  1995.  Sensing  Means  and  Sensor  Shells:  A  New  Method  of  Comparative 
Study  of  Piezoelectric,  Piezoresistive,  Electrostatic,  Magnetic,  and  Optical  Sensors.  Sensors  and 
Actuators  A.  Physical,  48(2),  87-100. 

2.  M.  Tabib-Azar,  1998.  Microactuators;  Electrical,  Magnetic,  Thermal,  Optical,  Mechanical,  Chemical 
and  Smart  Structures.  Boston,  MA:  Kluwer  Academic  Publishers. 

3.  P.W.  Shor,  1994.  Algorithm  for  Quantum  Computation:  Discrete  Logarithms  and  Factoring.  In 
Proceedings  of  35th  Annual  IEEE  Symposium  on  Foundations  of  Computer  Science,  124-137. 

4.  D.  P.  DiVincenzo,  1995.  Two-bit  gates  are  universal  for  quantum  computation.  Phys.  Rev.  A,  51(2), 
1015-1022. 

5.  C.  H.  Bennett,  1973.  The  Fundamental  Physical  Limits  of  Computation.  IBM  J.  Res.Dev.,  17,  525. 

6.  D.  Deutsch  and  R.  Joszsa,  1992.  Rapid  Solution  of  Problems  by  Quantum  Computation.  Proc.  Royal 
Soc.,  A  439,  553-558. 

7.  D.  Deutsch,  1985.  Quantum  theory,  the  Church-Turing  principle  and  the  universal  quantum  computer. 
Proc.  Royal  Soc.,  A  400,  97-1 17. 

8.  S.  Lloyd,  1993.  A  Potentially  Realizable  Quantum  Computer.  Science,  261(17)  1569-1571. 

9.  B.  Igelnik,  M.  Tabib-Azar,  S.  R.  LeClair,,  1999.  Quantum  net:  a  net  with  complex  coefficients. 
Submitted  to  IEEE  Transactions  in  Neural  Networks. 

10.  B.  Igelnik  and  Y.-H.  Pao,  1995.  Stochastic  choice  of  basis  functions  and  adaptive  function 
approximation.  IEEE  Transactions  on  Neural  Networks,  6(6),  1320-1329. 

1  LB.  Igelnik,  Y.-FI.  Pao,  S.  R.  LeClair,  and  C.-Y.  Shen,  1999.  The  ensemble  approach  to  neural  network 
learning  and  generalization.  IEEE  Transactions  on  Neural  Networks,  10(1),  19-30. 

12.  Pao,  Y-H,  1989.  Adaptive  pattern  recognition  and  neural  networks.  Reading,  MA:  Addison-Wesley. 

13.  P.  Villars.  1998.  Private  communication. 

14.  A.  Jackson,  M.  Benedict,  1997.  Private  Communication. 

15.S.  Fairchild,  1998.  Private  Communication. 


375 


Ontology  for  Phase  Diagram  Databases 

N.  Ono*,  R.  Kainuma*,  H.  Ohtani**,  K.  Ishida***  and  M.  Kato* 

*Graduate  School  of  Engineering,  Tohoku  University,  Sendai,  Japan 
**Center  for  Interdisciplinary  Research,  Tohoku  University,  Sendai,  Japan, 
***New  Industry  Creation  Hatchery  Center,  Tohoku  University,  Sendai,  Japan 


ABSTRACT 

Due  to  close  similarity  between  the  common  forms  of  ontology  specification  and  classes  in  object-oriented 
systems,  a  set  of  object-oriented  classes  in  a  domain  may  provide  the  skeleton  of  the  ontology  of  the 
domain.  An  object-oriented  design  for  phase  diagram  database  has  been  made  with  special  attention  to  this. 
Through  the  description  of  the  analysis  and  coding  of  phase  diagrams,  the  nature  of  the  task  of  ontology 
construction  is  illustrated. 


INTRODUCTION 

It  has  been  emphasized  in  recent  years  that  an  ontology  provides  a  common  foundation  for  the  development 
of  intelligent  systems  such  as  databases,  data  mining  systems,  expert  systems,  etc.  in  a  field  of  expertise 
[1].  An  ontology  as  such  is  expected  to  facilitate  not  only  the  development  of  individual  systems  but  also 
their  maintenance  and  integration.  The  same  should  apply  to  the  field  of  materials  science  and  engineering. 
It  is  to  be  noted  here  that  an  ontology  is  a  form  of  representation  of  knowledge  in  a  field.  When  the  field  is 
a  professional  discipline  such  as  materials  science  and  engineering,  therefore,  contribution  from  experts  in 
the  field  is  indispensable  in  the  compilation  of  an  ontology. 

The  present  authors  currently  attempt  to  construct  a  phase  diagram  database  by  the  use  of  MOOD-SX  that 
is  an  object-oriented  database  system  (OODBS)  developed  by  Ono  [2].  As  typically  seen  in  ASM  phase 
diagram  handbook  [3],  information  on  phase  diagrams  includes  not  only  diagrams  but  also  descriptions  of 
phases  and  their  structures,  invariant  reactions,  bibliographic  data  and  so  on.  In  the  course  of  examination 
of  contents  of  these  items  as  a  basis  for  our  database  schema  design,  it  has  been  noticed  that  the  design  of 
an  object-oriented  system  has  close  similarity  with  ontology  construction  [1,4, 5, 6].  In  both  cases,  we  need 
to  encode  entities  and  relations  among  them  in  the  domain  of  interest  and  further  the  way  of  encoding 
generally  used  in  the  current  ontology  study  resembles  that  in  the  object-oriented  system  development. 
From  this  and  also  from  the  significance  of  ontology  as  above  described,  it  is  considered  worthwhile  to 
extend  the  scope  of  our  work  in  the  phase  diagram  database  from  providing  containers  for  data  items  to 
contributing  to  the  construction  of  an  ontology  for  materials  science  and  engineering. 

In  the  following,  similarity  between  the  ontology  compilation  and  object-oriented  database  design  will  be 
examined.  It  will  be  shown  that  a  set  of  object-oriented  classes  can  provide  the  skeleton  of  an  ontology  of 
the  domain.  Following  these,  entities  involved  in  the  description  of  phase  diagrams  are  examined  and  a  set 
of  codes  to  capture  them  and  relations  among  them  will  be  proposed  and  discussed. 


ONTOLOGY  AND  MOOD-SX 

The  Free  On-line  Dictionary  of  Computing  [7]  defines  ontology  as  follows: 

An  explicit  formal  specification  of  how  to  represent  the  objects,  concepts  and  other  entities  that  are 
assumed  to  exist  in  some  area  of  interest  and  the  relationships  that  hold  among  them. 

The  examination  of  two  currently  common  ontology  specification  languages,  Ontolingua  [8]  and  Cyc  [9], 
indicates  that  the  relationships  are  generally  categorized  into  the  following  three  kinds: 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


376 


1.  Is-a  relationships 

2.  A-kind-of  relationships 

3 .  A  -part-of  relationships : 

In  the  statement  "sodium  is  an  alkaline  metal",  "alkaline  metal"  is  a  collection  of  things.  "Sodium"  is  a 
member  of  the  collection.  This  is  called  an  is-a  relationship.  Collection  as  such  is  called  also  class.  It  is 
often  said  that  "sodium"  is  an  instance  of  the  class  "alkaline  metal".  When  all  members  of  the  class 
"alkaline  metal"  are  also  members  of  another  class  "metal",  "alkaline  metal"  is  a  kind  of  "metal".  Entities  of 
interest  are  captured  as  either  classes  or  instances  and  arranged  in  a  hierarchical  classification  tree.  We 
generally  assign  specific  names  to  these  entities  so  that  an  ontology  is  often  compared  with  the  collection  of 
these  names  such  as  thesaurus  and  electronic  dictionary.  The  third  ones,  a-part-of  relationships,  are  not 
incorporated  in  these  items. 

For  example,  metals  have  certain  crystal  structures.  Here,  a  crystal  structure  type,  say,  A2  appears  in  the 
description  of  many  metals.  A2,  thus,  is  an  entity  separate  from  metal.  It  is  related  to  metals  and  constitutes 
a  part  of  them,  e.g.,  “Sodium  has  a  structure,  A2”.  This  link  between  sodium  and  A2,  or  the  link  between 
the  class,  metal,  and  the  class,  crystal  structure  type,  in  general  is  expressed  usually  by  stating  that  the  class, 
metal,  has  an  attribute,  "ciystal  structure  type",  and  the  value  of  the  attribute  is  constrained  to  the  class, 
crystal  structure  type.  Attributes  of  a  class  as  such  are  inherited  by  its  subclasses  with  their  constraints. 

It  is  apparent  that  the  constitution  of  ontology,  or  the  mechanism  for  explicit  formal  specification  of  entities 
and  relationships  so  far  described  above  is  basically  the  same  as  the  date  structure  of  object-oriented 
systems.  Programming-oriented  systems  like  C++,  Java  and  Smalltalk  and  OODBS  based  on  these  are 
more  restricted  from  this.  For  example,  these  systems  do  not  support  naming  of  individual  instances.  Java 
and  Smalltalk  do  not  allow  a  class  to  have  more  than  one  superclasses.  Some  OODBSs  such  as  02  [10]  and 
MOOD-SX  support  both  of  these.  Object-oriented  systems  as  such  are  capable  of  capturing  so  far  described 
aspects  of  ontology. 

Ontology  languages  have  features  other  than  these.  Cyc  expresses  a-part-of  relationships  with  binary 
predicates  and  those  predicates  are  classified  according  to  their  nature  in  a  variety  of  aspects.  For  example, 
methods  to  handle  particular  attributes  and  their  values  are  inherited  along  the  classification  of  attributes. 
This  classification  facilitates  implementation  of  functions  in  application  programs  based  on  the  ontology 
and  it  will  be  beneficial  in  database  services  that  include  user  interfaces.  MOOD-SX  and  Ontolingua  do  not 
support  this  classification  in  general.  However,  a  basic  classification  of  attributes,  i.e.,  if  their  values  should 
be  a  unique  value  or  a  collection  of  values,  is  supported  in  all  of  these.  In  MOOD-SX,  a  collection  of  things 
(A,  B)  is  expressed  very  explicitly  as: 

#list 
I  -with 
=A 
=B 

In  the  convention  used  in  the  description  of  MOOD,  this  is  an  instance  of  a  class  list  (indicated  by  its 
name  following  #)  that  has  an  attribute,  with  (indicated  with  leading  |  -)  whose  values  are  A  and  B 
(indicated  with  =).  For  collections,  distinction  is  made  in  regard  to  if  their  elements  are  ordered  or  not,  to  if 
each  element  is  unique  or  not,  and  also  to  the  cardinality.  MOOD-SX  does  not  support  the  last  two  of  these. 
The  following  features  may  be  further  enumerated: 

1 .  Constrains  on  attributes  inherited  from  superclasses  may  be  made  more  restrictive  in  subclasses.  This 
is  allowed  in  MOOD-SX,  but  not  in  common  object-oriented  systems. 

2.  Codes  in  Cyc  and  Ontolingua  are  written  in  LISP  and  various  functions  either  to  infer  or  to  calculate 
attribute  values  can  be  implemented  with  LISP.  MOOD-SX  is  equipped  with  a  few  kinds  of  default 
reasoning  mechanisms  but  it  does  not  yet  support  linkage  with  a  full  programming  language. 

3.  A  set  of  classes  can  be  declared  mutually  disjoint  in  Cyc  and  Ontolingua  but  not  in  MOOD-SX. 


377 


As  seen  from  these,  the  power  of  MOOD-SX  in  the  ontology  specification  is  the  subset  of  those  ofCyc  and 
Ontolingua.  It  is  not  possible,  therefore,  to  convert  classes  and  instance  objects  stored  in  the  MOOD-SX 
system  to  these  codes  automatically.  It  is  still  capable  of  capturing  basic  elements  of  ontology,  entities  in 
the  field  of  interest  and  essential  relationships  among  them.  To  identify  these  elements  and  to  encode  them 
are  the  starting  point  of  an  ontology  construction  and  call  for  the  most  intensive  contributions  from  domain 
experts.  Domain  knowledge  encoded  with  classes  and  instance  objects  in  MOOD-SX  can,  therefore,  play 
the  role  of  the  intermediate  expression  that  is  ready  forthe  more  intensive  formalization. 


ONTOLOGY  FOR  PHASE  DIAGRAMS 

Because  our  primary  aim  is  to  provide  a  good  phase  diagram  database  service,  the  ontological  analysis,  i.e., 
the  enumeration  of  entities  and  relationships  will  be  made  most  appropriately  in  the  top-down  manner 
starting  from  phase  diagram.  This  and  other  entities  which  appear  as  components  of  the  description  of 
phase  diagrams  such  as  the  elements,  phases,  crystal  structures  and  so  on  should  be  related  to  upper 
ontological  structures.  More  specifically,  we  need  to  have  at  least  one  superclass  for  each  of  these  classes 
and  it  is  desirable  that  the  superclass  is  smoothly  linked  to  yet  higher  hierarchy  such  as  the  Cyc  upper 
ontology.  It  is  also  desired  that  the  specifications  of  individual  components  are  not  only  suitable  for  the 
description  of  phase  diagrams  but  also  valid  in  more  general  contexts  in  science. 

Phase  diagrams  are  in  fact  diagrams,  and  lines  and  regions  drawn  in  them  should  convey  significant  pieces 
of  information.  In  spite  of  this,  study  of  the  semantics  of  these  drawings  is  left  for  the  future  work.  For  the 
moment,  then,  we  confine  ourselves  to  the  representation  of  the  descriptions  of  features  found  in  those 
diagrams  as  well  as  various  notes  that  generally  accompany  phase  diagrams. 

Identification  of  Sections:  Issues 

To  begin  with,  we  need  to  establish  a  way  to  identify  individual  phase  diagrams.  This  starts  with  such  a 
naive  question  as  follows:  Suppose  that  our  system  stores  an  Al-Fe  binary  phase  diagram  and  a  user  asked 
for  a  Fe-Al  phase  diagram.  How  would  our  system  know  that  it  is  appropriate  here  to  report  the  Al-Fe 
phase  diagram?  In  handbooks  such  as  [3],  this  problem  is  handled  by  enforcing  a  rule  that  two  components 
of  binary  phase  diagrams  are  enlisted  in  alphabetical  order.  This  rule,  however,  becomes  more  difficult  to 
adhere  to  in  the  description  of  pseudo-binary  diagrams,  i.e.,  the  temperature-concentration  (T-C)  sections  of 
ternary  systems  and  those  with  more  components.  Fe-10%Ni  -  Fe-20%Cr  diagram,  for  example,  should  be 
written  as  Cr-80%Fe  -  Fe-10%Ni. 

It  is  desirable,  therefore,  that  our  system  is  intelligent  enough  to  know  that  the  above  two  expressions  are 
equivalent  with  each  other  and  thus  to  save  us  from  the  trouble  of  this  rewriting.  To  achieve  this,  it  is  first 
needed  to  let  the  system  acknowledge  that  a  description  of  a  section,  A-B,  is  equivalent  to  B-A.  In  MOOD- 
SX,  this  is  expressed  with  an  instance  of  a  system  intrinsic  class  pair  as  follows: 

#pair 
I  -of=A 
|  ~and=B 

The  pair  as  such  is  the  MOOD  implementation  of  the  more  general  notion:  ‘h  set  (in  which  the  order  of 
appearance  is  insignificant)  of  cardinality  2”. 

Another  issue  is  the  equivalence  of  Fe-20%Cr  and  Cr-80%Fe.  Because  this  is  the  problem  common  to  all 
descriptions  of  composition,  it  will  be  discussed  elsewhere  in  more  general  context.  Yet  another  problem 
envisaged  in  the  description  of  T-C  sections  of  phase  diagrams  is  if  it  is  permissible  to  assume  that  the 
entities  A  and  B  in  the  above  pair  are  always  compositions.  For  example,  binary  phase  diagrams  may  be 
specified  with  two  pure  substances  like  alumina-silica  binary  phase  diagram.  However,  it  is  not  surprising 
that,  when  you  have  asked  an  intelligent  agent  (such  as  a  graduate  student)  to  find  alumina-silica  phase 
diagram,  the  agent  comes  up  with  an  Al203-Si02  pseudo-binary  section  found  in  a  ternary  phase  diagram 
handbook.  Because  the  formal  ontology  is  not  flexible  like  any  other  computer  system,  we  need  a  little  trick 


378 


to  let  our  system  behave  as  intelligent  as  this.  Namely,  suppose  that  we  have  implemented  binary  phase 
diagram  as  a  class  that  has  an  attribute  that  holds  a  pair  of  pure  substances.  The  corresponding  attribute  of 
the  class,  pseudo-binary  phase  diagram  takes  a  pair  of  compositions.  Because  of  this,  these  two  classes 
become  mutually  disjoint. 

In  the  first  sight,  pseudo-binary  phase  diagram  appears  to  be  an  extension  of  binary  phase  diagram.  One 
may,  therefore,  make  this  a  subclass  of  binary  phase  diagram.  The  attribute,  section,  of  binary  phase 
diagram  is  inherited  to  pseudo-binary  phase  diagram.  The  constraint  on  its  value,  a  pair  of  pure  substances, 
is  irrelevant  in  pseudo-binary  phase  diagram.  To  override  the  constraint  with  a  pair  of  compositions  is  not 
allowed  due  to  the  nature  of  the  formal  system.  When  we  tell  the  system  that  pseudo-binary  phase  diagram 
is  a  subclass  of  binary  phase  diagram,  we  permit  the  system  to  treat  an  instance  of  the  former  as  that  of  the 
latter.  The  system  may,  therefore,  try  to  manipulate  a  composition  in  the  instance  of  pseudo-binary  phase 
diagram  as  a  pure  substance  and  go  into  trouble.  Vise-versa  is  the  case  when  binary  phase  diagram  is  made 
a  subclass  of  pseudo-binary  phase  diagram.  These  two  classes,  therefore,  should  not  be  placed  in  a  thread 
of  inheritance  and  thus  should  be  made  mutually  disjoint.  Database  search  and  data  mining  based  on  an 
ontology  generally  rely  on  the  class  hierarchy.  A  query  on  alumina-silica  binary  phase  diagram  cannot  find 
those  represented  with  instances  of  a  disjoint  class,  pseudo-binary  phase  diagrams. 


Identification  of  Sections:  Proposed  Solution 

A  solution  to  remove  this  limitation  is  as  follows.  Let  us  start  with  the  fact  that  alumina  and  silica  are 
instances  of  compound  that  is  a  kind  of  pure  substance  while  neither  Fe-20%Cr  nor  Fe-10%Ni  are 
considered  to  be  a  substance.  Note  that  it  is  still  appropriate  to  mention  composition  for  all  of  Fe-20%Cr, 
alumina,  and  so  on.  These  things  are,  therefore,  something  that  is  commonly  characterized  with 
composition.  It  is  possible  to  mention  composition  of  simple  substances,  e.g.,  “pure  substance,  aluminum, 
consists  of  100%  single  element,  aluminum”.  These  things,  therefore,  can  include  simple  substances. 
Unfortunately,  we  do  not  have  a  name  to  call  these  things  for  which  mentioning  composition  is  relevant  so 
that  we  need  to  invent  one.  Let  us  call  them,  temporarily , pseudo-substance. 

The  classification  of  pseudo-substance  as  such  is  a  big  issue  that  involves  the  examination  of  all  forms  of 
existing  stuffs.  The  following  may  be  the  briefest  and  most  agreeable  one: 

pseudo- substance 
pure  substance 
simple  substance 
compound 
mixture 
alloy 

Here,  mixture  includes  stuffs  like  Fe-20%Cr  while  alumina  is  an  instance  of  compound.  However,  the 
distinction  of  compound  from  mixture  is  not  always  clear.  An  intermetallic  compound  Ni3Al,  for  example, 
is  generally  considered  as  not  a  pure  substance  but  as  a  phase  that  appears  among  a  series  ofNi-Al  alloys. 
We  may  argue  the  same  for  A1203.  It  is,  therefore,  considered  permissible  to  adopt  a  more  relaxed  attitude 
on  the  difference  between  compound  and  mixture  at  least  in  the  interpretation  of  phase  diagrams  as 
described  in  the  following. 

Corresponding  to  pseudo-substance,  let  us  assume  a  concept,  T-C  phase  diagram,  that  includes  both  binary 
and  pseudo-binary  phase  diagrams.  This  concept  is  convenient  to  distinguish  these  diagrams  from 
isothermal  phase  diagrams  that  are  omitted  from  the  present  consideration: 

phase  diagram 

isothermal  phase  diagram 
T-C  phase  diagram 

binary  phase  diagram 
pseudo-binary  phase  diagram 

This  class,  T-C  phase  diagram,  is  characterized  with  an  attribute,  section,  whose  value  is  constrained  to  a 
pair  of  pseudo-substances: 


379 


#T-C  phase  diagram 
|  -section=#pair 

I -of=#pseudo-substance 
I -and=#pseudo-substance 

Binary  phase  diagram  inherits  this  attribute  but  the  constraint  is  made  more  restrictive  there.  Its  section 
must  be  specified  with  a  pair  of  pure  substances.  Alumina-silica  phase  diagram  is  thus  represented  as 
follows: 

#binary  phase  diagram 
I  -section=#pair 

| -of=alumina 
I -and=silica 


In  pseudo-binary  phase  diagram,  this  restriction  is  not  possible  because  it  includes  sections  from  a  pure 
substance  to  a  mixture  like  Fe  vs.  Ni-50%Cr.  Alumina-silica  pseudo-binary  phase  diagram  may  be 
represented  equally  well  with  the  following  two: 

fpseudo-binary  phase  diagram 
| -section=#pair 

I -of=alumina 
I -and=silica 

#pseudo-binary  phase  diagram 
I  -section=#pair 

I -of=#mixture 

j  | -composition=Al-60mol%0 

I -and=#mixture 

I -composition=Si-66 . 7mol%0 

Our  intelligent  system  should  have  been  instructed  that,  or  a  method  to  infer  that,  alumina  is  a  pseudo¬ 
substance  with  a  composition  Al-60mol%O  and  so  on.  When  these  representations  are  compared  with  each 
other,  therefore,  it  is  possible  to  let  the  system  up-cast  all  of  these  instances  to  a  common  representation  as 
follows  and  find  that  they  all  mean  the  same  thing: 

#T-C  phase  diagram 
I -section=#pair 

I -of=#pseudo-substance 
|  I -composition=Al-60mol%O 

I -and=#pseudo-substance 

I -composition=Si-66.7mol%0 


Phases  and  Their  Appearances 

A  record  of  a  piece  of  phase  diagram  is  accompanied  by  the  description  of  the  section  as  discussed  above. 
It  should  have  a  pointer  to  an  image  data  for  the  diagram.  The  diagram  will  be  composed  of  curves 
delineating  regions  and  the  regions  will  be  accompanied  by  the  names  of  phases  stable  there.  The  diagram 
will  also  contain  insets  that  show  temperatures  of  invariant  reactions.  Although  phase  diagrams  should  be 
universal  truth  like  the  Plank  constant,  those  we  have  currently  are  artifacts  that  are  well  guessed  through 
elaborated  work.  It  is,  therefore,  appropriate  to  mention  the  method  of  construction  and  bibliography  for  the 
more  detailed  information  on  the  construction  process.  Although  each  of  these  things  has  its  own  details 
and  deserves  closer  inspection,  comments  will  be  made  only  on  phases  and  only  briefly  here. 

It  should  be  noted  that  phases  or  phase  names  in  phase  diagrams  do  not  directly  indicate  phases  but  they  are 
considered  to  be  labels  that  relate  individual  regions  to  their  descriptions  made  separately  from  the 
diagrams.  The  reason  for  this  remark  is  as  follows.  For  example,  in  Cu-Fe  system,  (Cu)  and  (y-Fe)  are 
separate,  distinct  regions.  However,  in  isothermal  section  of  Cu-Fe-Ni  phase  diagram,  (Cu)  and  (y-Fe)  form 
one  region  through  (Ni)  region.  Cu  and  y-Fe  phases  are,  therefore,  the  same  phase  in  fact.  This  is  true  in  a 
physical  point  of  view.  We  postulate  the  thermodynamic  stability  of  the  same,  say,  fee  phase  which  is 
contiguous  through  out  the  multi-dimensional  concentration-temperature-pressure  space.  (Cu),  (Ni),  (y-Fe) 
and  many  other  fee  regions  are  regions  where  this  single  fee  phase  happens  to  exist  as  the  most  stable  one 


380 


among  many  other  structures.  We  need  to  distinguish  each  appearance  of  this  phase  to  discuss  (Cu)  and  (y- 
Fe)  in  Cu-Fe  phase  diagram  individually  as  we  are  doing  here,  but  the  names  (Cu),  (y-Fe),  etc.  do  not 
necessarily  designate  different  phases. 


CONCLUDING  REMARKS 

Those  items  described  in  the  previous  section  may  well  illustrate  the  nature  of  the  tasks  needed  in  the 
process  of  examination  and  encoding  of  items  in  our  domain  for  the  construction  of  an  ontology.  The 
previous  section  does  not  cover  many  other  important  entities  such  as  crystal  structures,  elements  and  so 
on.  We  need  to  carry  out  a  similar  analysis  for  all  these  things. 

One  may  have  an  impression  that  formality  required  in  ontology  coding  is  too  rigid  to  capture  the 
complexity  in  the  core  of  materials  science  and  engineering.  Although  this  is  in  a  way  true,  to  pursue 
analysis  as  such  is  considered  still  worthwhile  since  at  least  we  can  foresee  in  the  light  of  the  analysis  how 
much  we  could  expect  from  intelligent  information  services.  Practical  systems  may  be  designed  by 
curtailing  some  part  of  the  complexity.  The  consultation  of  the  results  of  these  analyses  will  be  beneficial  in 
providing  a  basis  on  which  the  designers  of  the  systems  overview  what  they  are  capturing  as  well  as  what 
they  are  missing. 

On  the  other  hand,  the  impression  may  be  totally  wrong  and,  however  complex,  essential  concepts  in  a 
rational  science  should  be  well  encoded  in  an  equally  rational  way.  Further  study  is  needed  to  see  which 
expression  is  correct. 


ACKNOWLEDGEMENT 

Thanks  are  due  to  Mr.  T.  Kawaminami  and  T.  Hanayama  for  their  help  in  carrying  out  this  work.  Financial 
support  by  the  Grant-in-Aids  for  Scientific  Research  from  the  Ministry  of  Education,  Science  and  Culture, 
Japan  is  gratefully  acknowledged. 


REFERENCES 

1.  N.  Guarino.  1998,  Some  Ontological  Principles  for  Designing  Upper  Level  Lexical  Resources  First 
International  Conference  on  Language  Resources  and  Evaluation,  Granada,  Spain. 

2.  N.  Ono:  MOOD-SX  system  program  package,  http:  //mood. mech.tohoku.ac.jp/. 

3.  H.  Okamoto,  P.  Villars  and  A.  Prince,  1997.  Handbook  of  Ternary  Alloy  Phase  Diagrams,  Version  1.0. 
ASM  International. 

4.  S.  Borgo,  N.  Guarino  and  C.  Masolo,  1996.  Stratified  Ontologies:  The  Case  of  Physical  Objects. 
Proceedings  of  ECAI-96  Workshop  on  Ontological  Engineering,  Budapest. 

5.  A.  Gomez-Perez,  M.  Fernandez  and  A.  J.  de  Vicente,  1996.  Towards  a  Method  to  Conceptualize 
Domain  Ontologies,  Proceedings  of  ECAI-96  Workshop  on  Ontological  Engineering,  Budapest. 

6.  P.E.  van  der  Vet  and  N.J.I.  Mars,  1998.  Bottom-Up  Construction  of  Ontologies.  IEEE  Trans. 
Knowledge  and  Data  Eng.,  10,  513-526. 

7.  Free  On-Line  Dictionary  of  Computing,  http://wombat.doc.ic.ac.uk/ 

8.  T.R.  Gruber,  1992.  Ontolingua:  A  Mechanism  to  Support  Portable  Ontologies.  Technical  Report  KSL 
91-66,  Stanford  University . 

9.  Cycorp,  Inc.  1999.  The  Upper  Cyc  Ontology,  http://www.cvc.com/publication.html 

10.  0.  Deux  et  al.,  1990.  The  Story  of  02.  IEEE  Trans.  On  Data  and  Knowledge  Engineering,  2. 


381 


Prediction  of  Concrete  Mechanical  Behaviour  from 
Data  at  Lower  Ages  using  Artificial  Neural  Networks 

Jose  C.  Cassa,  Giovanni  Fioridia,  Andre  R.  Souza,  Rodrigo  T.  Oliveira 

Grupo  de  Estudos  em  Inteligencia  Computacional  Aplicada  (GEICAP) 
DCTM  -  Escola  Politecnica  -  Universidade  Federal  da  Bahia 
Rua  Aristides  Novis,  02  -  Federa^ao  40210-630  Salvador,  Bahia,  Brazil 

Email :  iccassa@ufba.br 


ABSTRACT 

The  complex  mechanical  behaviour  of  concrete  is  hard  to  model  with  traditional  mathematical  tools.  This 
paper  shows  how  artificial  neural  networks  can  be  used  to  predict  concrete  mechanical  strength  at  2$  day  by 
using  experimental  results  from  1st  and  7th  days. 


INTRODUCTION 

The  material  science  of  cementitious  materials  has  not  developed  as  far  as  the  science  of  other  materials 
(metals,  polymers,  ceramics  and  others)  but  there  is  the  perception  in  the  academic  community  that  computing 
may  be  able  to  help  overcome  some  of  the  difficulties.  One  major  challenge  of  concrete  technology  is  the 
development  of  a  system  to  predict  the  mechanical  behaviour  of  material.  The  developments  so  far  have  been 
based  on  empirical  experience  valid  only  under  special  situations.  Many  engineers  and  scientists  believe  that 
one  possible  way  to  represent  the  performance  of  concrete  is  through  reference  models  that  can  explain  its 
behaviour. 

This  reference  model  must  relate  the  organised  knowledge  by  using  a  set  of  mathematical  equations  or  other 
equivalent  methods  such  as  a  simulation  model  or  alternative  techniques  suitable  for  complex  phenomena. 
Simulation  models  probably  are  the  most  viable  way  to  represent  the  material  science  and  technology  of 
concrete.  In  traditional  mathematical  models,  equations  based  on  first-principles  or  derived-empirically  from 
experiments,  demonstrate  the  concepts  included  in  the  model.  But  questions  related  to  concrete  are  normally 
complex  such  as: 

a)  Which  factors  influence  the  behaviour  of  this  material  due  to  variation  of  one  or  more  components, 
mixing  proportion  and  cure  conditions? 

b)  Which  paths  are  possible  to  find  the  desired  end  conditions? 

Many  of  the  phenomena  involved  are  hard  (or  impossible)  to  describe  using  deterministic  or  stochastic  models. 
In  other  cases,  the  level  of  available  knowledge  is  not  yet  complete  for  its  representation  with  the  precision  and 
reliability  required.  For  these  cases,  neural  network  modelling  can  be  a  good  approach  for  reliable  predictions. 


ARTIFICIAL  NEURAL  NETWORKS 

The  concept  of  artificial  neural  networks  (ANN)  was  inspired  by  the  hope  to  reproduce  artificially  some  of  the 
flexibility  and  power  of  the  human  brain.  Humans  can  recognise  complex  patterns  although  there  is  not  always 
a  complete  understanding  of  the  involved  phenomena.  The  application  of  ANN  for  modelling  systems  that  are 
hard  to  be  represented  by  phenomenological  explanation  may  looks  difficult  at  first  but  it  is  easy  to  understand 
for  those  who  are  familiar  with  least-square  modelling  and  optimisation. 

A  computer  ANN  is  composed  of  single  elements  called  nodes,  neurons  or  processing  elements.  These 
elements  receive  several  inputs  through  weighted  connections,  multiplies  each  input  by  its  respective  weight, 
totals  the  results,  and  performs  some  type  of  transfer  function  on  the  input  sum.  Once  the  transfer  function  has 
acted,  the  result  is  forwarded  to  the  next  processing  element. 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


382 


In  the  multi-layer  perceptrons  networks  (MLP)  the  processing  elements  are  arranged  in  interconnected  layers. 
Usually  there  is  an  input  layer  to  receive  data,  one  or  more  hidden  layers,  and  finally,  the  output  layer  that 
transmits  the  result  of  the  network  calculations.  The  hidden  layer  is  essential  to  represent  non-linear  processes. 
Another  feature  of  the  MLP  is  the  bias  vector  that  can  be  regarded  as  an  additional  degree  of  freedom  that  can 
be  adjusted  to  obtain  the  desired  network  performance.  Although  many  different  ANN  structures  and  training 
techniques  have  been  described  in  the  literature  [1]  multi-layer  perceptrons  (MLP)  seem  to  be  suitable  to 
model  complex  systems.  In  simple  terms,  this  type  of  ANN  works  like  a  non-linear  regression  approach.  As 
any  curve  fitting  technique,  model  parameters  can  be  found  by  least  square  optimisation. 

The  MLP  designer  must  specify  the  number  of  hidden  layers  (usually  one),  the  number  of  nodes  in  each  layer, 
and  the  transfer  function  of  each  layer.  This  is  the  MLP  "architecture".  Expertise,  judgement  and  trial-and- 
error  as  in  regression  methods  are  used  to  define  the  suitable  architecture.  Once  the  MLP  architecture  is 
selected,  the  weights  have  to  be  adjusted  by  finding  the  values  of  weights  that  minimise  the  sum  of  square 
deviations  of  the  ANN  and  process  outputs.  The  traditional  training  approach  is  called  backpropagation  and 
usually  works,  but  in  some  cases  non-linear  optimisation  may  be  necessary.  In  order  to  speed  the  convergence 
of  backpropagation  training  some  complementary  techniques  are  available  such  as  using  better  initial 
conditions,  variable  learning  rate  and  to  avoid  traps  at  local  minimum  conditions  in  the  error  function  -  the 
impulse  function  [2,3]. 

The  main  advantages  of  MLP  are: 

a)  Its  ability  to  grasp  very  complex  relationships  and  to  learn  the  process  behaviour  by  adjusting  to  real  data 
with  no  need  of  previous  basic  understanding  of  the  process  or  the  definition  of  a  number  of  rules  or 
equations.  So,  complex  systems  can  be  modelled  with  little  specialist  knowledge. 

b)  Once  MLP  has  completed  training,  it  can  predict  useful  results  in  spite  of  presence  of  some  faulty 
information.  This  robust  behaviour  allows  MLP  to  compute  noisy  data,  partial  data  and  even  foreign 
data  and  still  perform  acceptably. 

c)  There  is  no  need  of  prior  screening  the  most  important  variables,  as  the  trained  network  will  learn  the 
relationship  between  responses  and  input  variables  finding  out  automatically  which  inputs  and 
connections  are  important.  If  the  training  reduces  the  interaction  between  processing  elements  by 
assigning  low  weights,  irrelevant  parameters  can  be  eliminated  through  the  assignment  of  null  weights  to 
these  connections  (pruning  techniques).  This  property  makes  ANN  a  more  realistic  approach  with  no 
need  of  simplification  beforehand  and  is  particularly  helpful  in  multivariable  models. 

d)  It  is  computationally  efficient.  This  characteristic  allows  ANN  to  replace  models  that  are 
computationally  intensive  making  real-time  model-based  applications  practicable. 

Although  MLP  neural  networks  can  solve  many  problems  of  modelling,  other  ANN  architectures  [1]  such  as 
General  Regression  Neural  Networks  (GRNN),  Probabilistic  Neural  Networks  (PNN),  and  Polynomial 
Networks  (GMDH)  may  be  able  to  do  same  job  easier  as  it  will  be  showed  in  this  paper.  The  previous  MLP 
advantages  are  also  present  in  these  ANN  architectures. 


PREDICTING  THE  MECHANICAL  STRENGTH  OF  CONCRETE 

The  mechanical  strength  of  concrete  increases  continuously  as  a  function  of  time  due  to  the  evolution  of  the 
hydration  reaction  of  cement.  The  determination  of  mechanical  strength  of  concrete  in  quality  control 
technological  tests  is  performed  at  28th  day  accordingly  most  national  standards  (ABNT,  ASTM,  DIN,  BS  and 
others).  In  large  concrete  works  (i.  e.,  bridges  and  dams)  it  is  also  useful  the  knowledge  of  expected  strength  in 
the  long  term  (01  to  50  years  old).  The  traditional  procedure  for  strength  determination  at  any  age  consists  in 
the  preparation  of  concrete  specimens  with  standard  geometry  followed  by  mechanical  testing  as  described  in 
the  technical  literature.  Under  this  scenario,  prediction  models  that  are  able  to  anticipate  the  quality  of 
produced  materials  are  very  useful  in  order  to  check  if  conditions  designed  in  the  project  will  be  achieved  or  if 
not,  to  indicate  the  need  some  action  to  satisfy  the  specifications. 

Usually  this  prediction  is  carried  out  by  using  some  methods  based  in  a  set  of  characteristics  of  the  cement  or 
concrete  under  study.  These  methods  can  be  grouped  as 


383 


a)  Methods  based  in  the  strength  obtained  under  conditions  of  accelerated  hydration/  cure 

b)  Methods  based  on  the  strength  at  normal  cure  conditions  at  low  ages 

c)  Methods  of  correlation  with  other  characteristics  of  cement  or  concrete  (i.  e.,  chemical  composition  and 

fineness  of  cement,  hydration  heat  developed  in  the  concrete  mixture  or  other  characteristics). 

The  validity  and  limitation  of  these  methods  are  described  in  the  literature[4,5],  [6], 

In  the  perception  of  the  authors,  the  main  difficulty  for  the  application  of  any  method  mentioned  before  is  the 
identification  of  a  suitable  mathematical  equation.  The  complexity  of  the  physicochemical  phenomena  involved 
in  concrete  formation  is  an  inherent  difficulty  to  the  application  of  any  computer  model.  The  traditional  models 
available  are 

a)  Empirical  models  -  obtained  by  best  fitting  equations  to  experimental  data  using  techniques  such  as 
regression  analysis.  These  models  are  relatively  simple  and  thier  application  is  limited  to  the 
experimental  conditions  tested 

b)  Phenomenological  models  -  based  on  fundamental  equations  of  fluid  flow  in  porous  media  and  kinetic 
equations  to  represent  solid-fluid  reactions  and  the  energy  conservation  laws.  These  models  are  complex 
which  require  some  simplifying  hypotheses  to  lead  to  viable  numeric  solutions.  The  usefulness  of  any  of 
these  models  depends  on  its  completeness  and  how  realistic  are  the  assumed  operating  conditions. 

The  main  objective  of  this  paper  is  to  show  a  model  free  technique  in  order  to  help  those  professionals  working 
in  concrete  technology  with  interest  in  the  anticipated  knowledge  of  mechanical  strength  by  using  systems 
based  on  ANN.  These  systems  must  use  only  data  available  from  the  technological  quality  control  of  cement 
or  concrete  production  and  should  be  able  to  predict  with  reasonable  precision  the  mechanical  behaviour  under 
expected  conditions  (higher  ages). 


THE  CONSIDERED  PROBLEM 

Prediction  of  mechanical  strength  of  concrete  at  traditional  age  of  control  (28th  day) 

In  order  to  illustrate  the  applicability  of  ANN  for  the  prediction  of  concrete  mechanical  properties,  a  simple 
problem  was  chosen.  This  problem  has  been  solved  satisfactorily  before  by  using  conventional  statistical 
techniques  so  that  the  efficiency  of  ANN  methods  can  be  compared  and  evaluated. 

Tango  et  al.  [4,5]  suggest  the  AMEBA  method  for  the  prediction  of  concrete  strength  at  the  28th  day  by  using 
experimental  results  at  low  (01  day)  and  middle  (07  days)  ages.  The  technical  reasoning  is  well  described  in 
the  literature.  Basically,  the  method  assumes  that  there  is  a  mathematical  model  suitable  to  represent  the 
mechanical  strength  of  concrete  as  a  function  of  time.  In  this  case,  there  is  a  continuous  mathematical  function, 
differentiable,  exact,  so  that 

r(t3)  =  fjr(ti),  r(t2)]  1. 

where  r(tl),  r(t2),  r(t3), ...  are  strength  r(ti)  at  some  age  (high  =  Alta,  middle  =  MEdia  and  low  =  BAixa  ages 
forming  the  acronym  AMEBA  in  Portuguese). 

The  model  adopted  in  the  AMEBA  method  is  just  an  approximation  of  the  exact  mathematical  model.  The 
usual  mathematical  models  proposed  in  the  literature  contain  intrinsically  inadequacies  derived  from 
simplifications  in  the  original  physical  model  and/or  in  the  linearization  of  involved  equations  in  order  to  find  a 
viable  solution.  This  discrepancy  with  the  exact  model  can  be  evaluated  by  comparison  of  experimental  and 
calculated  results  using  mean  square  error  (MSE)  as  model  discriminator  and  then,  allowing  the  ranking  of 
proposed/  available  models. 

Prediction  models  at  other  ages  (i.e.,  01  and  up  to  50  years)  can  be  developed  from  available  database.  In  this 
cases,  mechanical  strengths  at  07  and  28  days  can  be  used  to  predict  results  at  01  year  and  experimental  data  at 
28  days  and  01  year  to  predict  at  05,  10,  20  and  50  years  (work  in  this  direction  is  in  progress  at  GEICAP  - 
UFBA).  Few  databases  displaying  such  mechanical  strengths  of  concrete  up  to  50  years  of  age  are  available  in 


384 


the  world.  One  of  those  is  available  at  IPT  (Sao  Paulo  -  Brazil)  for  cements  produced  in  Brazil  [4],  The 
experimental  data  for  this  problem  were  taken  from  [5]  for  concretes  produced  during  a  tunnel  construction  in 
Sao  Paulo,  where  40  results  were  used  for  training  and  another  40  for  testing  the  proposed  systems. 


TRAINING  THE  NETWORK 

Using  Backpropagation 

The  architecture  of  a  Multi-Layer  Perceptron  (MLP)  has  to  be  adapted  to  the  problem  to  be  solved.  The 
number  of  network  inputs  to  the  network  is  constrained  by  the  problem,  and  the  number  of  neurons  in  the 
output  layer  is  constrained  by  the  number  of  outputs  required  by  the  problem.  Backpropagation  (BP)  training 
follows  gradient  descent  on  the  error  surface  to  minimise  network  error.  Local  minima  may  trap  the  network. 
The  more  neurons  in  intermediate  layers  the  more  freedom  a  network  has  and  more  variables  to  optimise. 
Hecht-Nielsen  [7]  demonstrated  that  a  hidden  layer  with  (2a  +  1)  neurons,  where  n  is  the  number  of  input 
variables,  can  represent  any  mathematical  continuous  function.  It  is  known  that  networks  with  biases  and  at 
least  one  sigmoid  neuron  layer  are  capable  of  approximating  any  reasonable  differentiable  function[2,3]. 

The  backpropagation  learning  rules  are  used  to  adjust  the  weights  and  biases  of  networks  so  as  to  minimise  the 
squared  error  of  the  network.  This  is  done  by  continually  changing  the  values  of  the  network  weights  and 
biases  in  the  direction  of  steepest  descent  with  respect  to  error.  Trained  BP  tends  to  give  reasonable  answers 
when  presented  with  inputs  that  they  have  never  seen.  Unfortunately,  plain  BP  is  slow  because  of  low  learning 
rates  and  the  large  number  of  variables  in  multilayered  networks,  and  suffers  from  other  difficulties  in  addition. 

Methods  of  improving  backpropagation  include  techniques  such  as  momentum  and  variable  learning  rate. 
These  improvements  can  be  used  to  make  BP  more  reliable  and  faster  by  a  factor  of  1 0-20  on  small  problems 
and  perhaps  more  on  larger  problems.  Momentum  allows  a  network  to  respond  not  only  to  the  local  gradient, 
but  also  to  recent  trends  in  the  error  surface  and  allows  the  network  to  ignore  small  features  in  the  error  surface 
sliding  through  local  minimum.  Training  time  can  also  be  decreased  by  the  use  of  an  adaptive  learning  rate 
which  attempts  to  keep  the  learning  step  size  as  large  as  possible  while  keeping  learning  stable.  The  learning 
rate  is  made  to  vary  with  the  complexity  of  the  local  error  surface. 

Backpropagation  can  be  also  improved  by  using  more  favourably  initial  conditions  than  purely  random 
numbers.  Ngyen  and  Widrow  [8]  demonstrated  that  a  hidden  layer  with  sigmoid  /  linear  functions  can  be 
viewed  as  performing  a  piecewise  linear  approximation  of  any  learned  function.  It  is  shown  that  weights  and 
biases  generated  with  certain  constraints  will  result  in  an  initial  network  better  able  to  form  a  function 
approximates  to  the  respect  of  any  arbitrary  function.  Use  of  Ngyen-Widrow  (instead  of  purely  random)  initial 
conditions  shortens  training  time  by  more  than  an  order  of  magnitude. 

General  Regression  Neural  Networks 

General  regression  neural  networks  (GRNN)  are  feedforward  networks  based  on  probability  density 
functions.  GRNNs  train  fast  showing  good  performance  provided  enough  experimental  data  are  available. 
This  network  was  developed  in  the  statistical  literature  as  kernel  regression  and  rediscovered  later[9]  as  a 
new  ANN  architecture.  Its  topology  consists  of  four  layers:  the  input  layer,  a  hidden  layer  working  as 
classifier,  addition  neurons  and  the  output  layer.  Training  is  processed  in  one  step  when  the  training  set  is 
copied  into  the  hidden  layer.  The  addition  neurons  process  the  kernel  function.  The  network  approximates 
any  new  input  to  the  nearest  one  available  in  the  classifier  and  then  presents  the  output  response.  The 
weights  are  a  smoothness  factor  that  must  be  trained  and  calibrated  using  some  optimisation  algorithm. 

Polynomial  Neural  Networks 

Polynomial  neural  networks  also  called  Group  Method  of  Data  Handling  (GMDH)  was  invented  by 
Ivakhnenko  in  Russia  but  later  used  as  neural  networks  and  enhanced  by  others  [9]. 

GMDH  works  by  building  successive  layers  with  complex  links  (or  connections)  that  are  individual  terms  of 
a  polynomial.  These  polynomial  terms  are  created  by  using  linear  or  non-linear  regression.  The  initial 


385 


layer  is  simply  the  input  layer.  The  first  layer  created  is  made  by  computing  regressions  of  the  input 
variables  and  then  choosing  the  best  ones.  The  second  layer  is  created  by  computing  regressions  of  the 
values  in  the  first  layer  along  with  the  input  variables.  Once  again,  the  best  are  chosen  using  a  convenient 
algorithm.  This  process  continues  until  the  network  stops  getting  better  (according  to  a  specified  criterion). 
Note  this  way  the  architecture  of  the  network  is  not  previously  chosen  but  developed  iteration  by  iteration. 

The  resulting  network  can  be  represented  as  a  complex  polynomial  description  of  the  model.  In  some 
respects,  it  is  like  using  regression  analysis  but  is  far  more  powerful.  GMDH  can  build  very  complex 
models  while  avoiding  overfitting  problems. 

NEUROSHELL  GMDH  [9]  can  recognise  the  most  significant  variables  as  it  trains,  and  will  display  a  list 
of  them.  This  software  has  also  facilities  to  select  the  degree  of  expected  model  non-linearity  (off,  low, 
medium  and  high)  as  well  as  the  model  diversity  by  using  the  maximum  number  of  survivors  (low,  medium 
and  high).  The  length  of  the  model  is  associated  to  complexity  (low,  medium  and  high).  The  final  model 
can  be  also  optimised  eliminating  the  less  significant  parameters. 


RESULTS 

The  considered  problems  were  firstly  modelled  using  simple  MLP  with  one  hidden  layer  trained  by 
backpropagation.  After  several  trials  it  was  found  that  hyperbolic  tangent  functions  in  the  hidden  layer  and 
linear  function  in  the  output  layer  gave  the  best  predictions.  Figure  1  represents  graphically  the  optimised 
MLP  network.  The  predicted  results  (MSE  =  6,62  %)  were  compared  with  the  performance  of  AMEBA 
method  (MSE  =10,10  %)  indicating  the  advantage  of  MLP.  The  solutions  correspond  to  the  first  approach 
when  using  ANN  indicating  clearly  the  power  of  this  technique. 


Input  variables  Hidden  layer  Output  variable 


Fig.  l.  Best  simple  MLP  architecture  for  the  problem. 

In  order  to  improve  previous  predictions  search  of  new  types  of  ANN  architecture  were  performed.  The 
idea  was  to  investigate  all  possible  combinations  such  as  the  number  of  hidden  layers,  the  number  of 
neurons  in  each  layer,  and  the  type  of  transfer  functions  in  each  neuron  (including  free  allocation  of  these 
transfer  functions  in  any  neuron).  The  best  combination  for  the  considered  problem  was  a  network  with  two 
hidden  layers  (three  tangent  functions/  three  linear  functions  for  layer  one  and  three  linear  functions/  four 
tangent  functions  for  layer  two)  and  one  output  layer  (linear  function)  with  MSE  =  5,47%.  It  can  be  seen 
that  only  small  improvements  were  obtained  indicating  the  goodness  of  traditional  approach  (simple  MLP). 

Other  more  advanced  networks  suitable  to  represent  the  studied  problems  were  also  considered  giving  the 
results  summarised  in  Table  1.  These  networks  correspond  to  the  best  architecture  found  for  each  case 
considered.  These  applications  were  developed  using  MATLAB  [2,3],  NEUROGENETIC  OPTIMIZER 
[10]  and  NEUROSHELL  [9].  In  these  cases  two  types  of  ANN  gave  good  results:  GRNN  and  GMDH. 
GRNN  works  as  interpolating  polynomials  where  the  training  set  is  copied  in  the  hidden  layer.  This  type  of 
network  will  lead  to  satisfactory  results  when  enough  reliable  data  are  available.  The  authors  have  worked 
with  problems  where  GRNN  were  the  best  choice.  It  is  interesting  to  note  that  GMDH  looks  as  regression 
models  that  are  familiar  to  researchers  and  engineers.  The  complexity  necessary  in  any  case  will  depend  on 
the  desired  precision  and  accuracy.  In  the  studied  cases,  even  simple  models  gave  good  results. 


386 


Table  1:  Summary  of  results  of  best  fitting  some  ANN  architectures 


MODEL 

Mean  Squared  Error 
(MSE)  -  % 

AMEBA* 

10,10 

5,47 

GRNN 

5,70 

GMDH 

4,11 

(*)  statistical  model 


The  final  evaluation  of  the  usefulness  of  the  systems  developed  can  be  assessed  in  figures  2  to  4  when 
calculated  and  experimental  values  are  compared  for  the  considered  problem.  It  can  be  seen  that  in  all 
cases  good  agreement  was  found  indicating  reasonable  solutions  for  all  proposed  models. 


Fig.  2.  Calculated  vs.  Experimental  Data  for  AMEBA  method  (statistical)  -  (MSE  =  10,10%). 


Fig.  3.  Calculated  vs.  Experimental  Data  for  best  MLP  (MSE  =  5,47  %). 


387 


Fig.  4.  Calculated  vs.  Experimental  Data  for  simple  GMDH  -  Problem  1  (MSE  =  4,1 1  %). 


CONCLUSIONS 

Building  materials  of  complex  mechanical  behaviour,  such  as  concrete,  can  be  easily  modelled  using 
artificial  neural  network  techniques  even  if  few  experimental  data  points  are  available.  The  analysed 
problem  show  clearly  the  potential  and  usefulness  of  artificial  neural  networks.  It  is  hoped  that  this  paper 
will  encourage  other  applications  of  artificial  neural  networks  for  the  prediction  of  mechanical  performance 
of  other  complex  materials,  specially  those  materials  that  do  not  have  any  model  available  or  the  traditional 
models  do  not  offer  the  desired  precision. 

In  another  work  [11]  the  authors  discuss  the  use  of  genetic  algorithms  in  automatic  selection  of  neural 
network  architecture.  The  automation  of  model  developing  process  makes  the  application  quite  simple  even 
for  end  users  not  familiar  with  neural  programming  and  non-linear  modelling. 


REFERENCES 

1.  S.  Haykin,  1994.  Neural  Networks  -  A  comprehensive  foundation.  Prentice  Hall 

2.  MATLAB,  1992.  MATLAB  High  Performance  Numeric  Computation  Software.  The  Mathworks. 

3.  MATLAB,  1993.  MATLAB  Neural  Network  Toolbox.  The  Mathworks. 

4.  C.  E.  S.  Tango,  1991.  Um  estudo  do  desenvolvimento  da  resistencia  a  compressao  do  concreto  de  cimento 
Portland  ate  50  anos.  Boletim  160  -  IPT. 

5.  C.  E.  S.  Tango,  et  al.,  1995.  Planilha  eletronica  para  previsao  da  resistencia  do  concreto.  Anais  do  37 
REIBRAC  -  IBRACOM,  785-797). 

6.  L.  M.  Dragicevic  and  M.  M.  Rsumovic,  1987.  Prognosis  of  characteristics  of  multicomponent  materials  on 
the  example  of  flexural  strength  of  Portland  cement.  Cement  and  Concrete  Research,  17,  .47-54. 

7.  R.  Hecht-Nielsen,  1990.  Neurocomputing.  Addison- Wesley. 

8.  M.  Hagan,  H.  Demuth  and  M.  Beale,  1996.  Neural  Network  Design.  PWS  Publishing  Company. 

9.  NEUROSHELL,  1996.  NeuroShell  2.  Ward  System  Group  Inc 

10.  BIOCOMP,  1997.  NeuroGenetic  Optimizer.  Biocomp  Systems  Inc. 

1 1.  J.  C.  Cassa,  et  allii,  1999.  Prediction  of  Cement  Paste  Mechanical  Behaviour  from  Chemical 
Composition  Using  Genetic  Algorithms  and  Artificial  Neural  Networks.  IPMM  99  -  The  Second 
International  Conference  on  Intelligent  Processing  and  Manufacturing  of  Materials  -  Honolulu,  Hawaii, 
July  10-15. 


388 


389 


IMPROVING  THE  PREDICTION  ACCURACY  OF 
CONSTITUTIVE  MODEL  WITH  ANN  MODELS 

L.X.  Kong  and  P.D.  Hodgson 

School  of  Engineering  and  Technology,  Deakin  University, 
Geelong,  Vic  3217,  Australia. 


ABSTRACT 

The  unified  constitutive  model  developed  by  Estrin  and  Mecking  has  successfully  been  used  in  hot  rolling 
to  provide  information  for  the  control  of  strip  thickness.  It  has  presented  a  high  accuracy  in  predicting  the 
hot  strength  of  austenitic  steels.  However,  the  materials  can  show  quite  different  properties  under 
different  deformation  conditions  and  the  constitutive  models  are  not  able  to  be  generalised  to  cover  a  wide 
range  of  compositions  and  deformation  conditions,  therefore,  the  potential  of  those  model  is  limited.  In 
this  work,  the  robustness  of  the  unified  constitutive  model  is  enhanced  by  incorporating  an  artificial 
neural  network  model  to  predict  the  flow  strength  of  austenitic  steels  with  carbon  content  ranging  from 
0.0037  to  0.79%. 

INTRODUCTION 

Collinson  et  al  [1]  experimentally  studied  the  effect  of  carbon  content  on  the  flow  strength,  static  and 
dynamic  recrystallisation  behaviour  of  C-Mn  steels  with  carbon  contents  ranging  from  0.0037  to  0.79 
wt.%.  It  was  observed  that  the  carbon  content  had  a  complex  effect  on  the  flow  strength  of  austenite. 
Under  single  peak  dynamicrecrystallisation  conditions,  the  low  carbon  steel  showed  a  continuous  increase 
in  the  critical  strain  for  the  initiation  of  dynamicrecrystallisation  as  the  strain  rate  of  testing  was 
increased  and  the  temperature  reduced.  For  the  high  carbon  steel  the  critical  strain  for  dynamic 
recrystallisation  increased  up  to  a  strain  of  0.7  and  then  remained  constant,  despite  further  reductions  in 
the  temperature  or  increases  in  the  strain  rate.  Attempts  to  predict  the  strength  with  a  universal  model 
gave  high  prediction  errors  for  both  high  and  low  carbon  steels. 

Kong  et  al  [2]  studied  the  stress-strain  behaviour  of  these  austenitic  steels  with  a  modified  artificial  neural 
network  (ANN)  model.  In  this  model,  three  parameters  -  work  hardening  rate,  square  product  of  work 

hardening  rate  and  stress,  and  logarithm  Zener-Hollomon  parameter,  Z  (Z  =  8  exp (Qde/  / RT) ,  Qdef  is  the 

activation  energy),  -  developed  from  phenomenological  and  empirical  models  were  used  as  inputs  to  train 
the  model  in  conjunction  with  deformation  conditions  and  carbon  content.  The  modified  model 
significantly  improved  the  prediction  of  stress  with  a  reduction  in  both  the  average  and  standard  errors. 
However,  the  ANN  model  has  limited  capability  of  extrapolation  which  is  required  for  industrial 
implementation  as  has  been  discussed  in[3]. 

In  the  current  work,  the  Estrin  Mecking  model  was  used  to  predict  the  hot  strength  of  low  carbon,  high 
carbon  steels  and  the  whole  range  of  steels,  respectively.  The  constitutive  predictions  were  then  compared 
with  those  of  an  ANN  model.  As  both  constitutive  and  ANN  models  were  advantageous  in  some  aspects, 
a  combined  model  was  then  developed. 

ESTRIN  MECKING  MODEL  AND  PREDICTIONS 

The  modified  Estrin  and  Mecking  model,  which  incorporated  dynamicrecrystallisation  (DRX)  is  [3]: 

aEM  =  [a,2  +  (o02  -a 2 )  exp(-2fle)] 1/2  e  <  ec  } 

a  =oEM  -X(oEM  -oss)  e>ec 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


390 


and  X  =  — —  =  1  -  exp(-a(e  -  Ec)b)  2. 

®ss 

where  ec  is  the  critical  strain.  When  the  strain  is  smaller  than  critical  strain,  ie,  within  the  work 
hardening  region,  the  stress  is  predicted  with  EM  model.  TheAvrami  equation  [4]  was  incorporated  with 
EM  model  if  the  strain  was  larger  than  critical  strain.  During  the  development  of  the  modified  EM 
model,  the  constants  in  Eq.  1  such  as  B,  saturation  stress  as,  steady  state  stress  oss,  critical  strain  ec  and 
constants  a  and  b  in  Eq.  2  were  determined  from  experimental  data. 


The  evaluation  of  the  constants  in  Eqs.  1  and  2  has  been  discussed  in  [3].  For  a  steel  deformed  under 
certain  conditions,  Eq.  1  was  to  predict  the  hot  strength  in  the  work  hardening  regime.  Generally,  the 
Estrin  and  Mecking  model  [5]  is  used  to  obtain  saturation  stress  cs  and  the  constant  B, 

Go  =  A-  Bo1  3. 

where  0  is  work  hardening  rate.  And  saturation  stress 

as  -ylA/B  4. 

To  obtain  constants  a  and  b,  Eq.  2  is  rewritten  as, 

lnu  +  Mn(e -ec)  =  ln(-ln(l-A))  5. 

The  constants  obtained  with  the  above  analysis  are  optimised  to  predict  the  hot  strength  of  the  specific 
steel  under  specific  deformation  conditions  tested.  The  Estrin  and  Mecking  approach  provides  a  good 
constitutive  understanding  of  the  metals  deformed  at  elevated  temperatures  and  wide  range  of  strain  rate, 
for  example  stainless  steel  [6]  and  austentic  carbon  steel  [2]. 


Using  the  Estrin  and  Mecking  model,  steels  with  carbon  content  ranging  from  0.0037  to  0.79%  were 
analysed.  Fig.  1  shows  the  prediction  with  the  optimum  constants  (ie  different  constants  for  each  curve) 
for  steels  of  carbon  constant  of  0.065  and  0.58%  which  represents  the  two  regions  with  markedly  different 
strain  stress  behaviours  [1],  As  only  the  optimum  value  for  the  constants  was  used,  the  Estrin  and 
Mecking  model  can  accurately  predict  the  strain  stress  behaviourof  both  low  and  higher  carbon  steel. 


- Experiment  °  900  x  1000  °  1100 


- Experiment  °  900  x  1000  °  1100 


Strain 


(a) 


(  b) 


Fig.  1.  Comparison  between  EM  prediction  with  optimum  constants  and  experimental  results  deformed  at 
a  strain  rate  of  10s"1  for  two  steels:  (a)  0.065%C  (b)  0.58%C 

Although  the  modified  Estrin  and  Mecking  modelling  with  optimum  constants  can  predict  the  hot 
strength  with  very  high  accuracy,  it  cannot  predict  the  strain-stress  behaviour  of  other  steels  or  the  same 
steel  deformed  at  different  conditions.  This  requires  generalisation.  The  application  of  the  Estrin  and 


391 


Mecking  model  using  constants  optimised  over  the  entire  data  set  to  predict  steels  with  lower  and  high 
carbon  content  is  shown  in  Fig.  2.  The  peak  stress  and  strain  for  both  steels  deformed  at  a  strain  rate  of 
10s"1  and  temperatures  from  900  to  1100°C  were  accurately  predicted  with  the  accuracy  for  steel  0.065% 
being  higher  than  that  of  the  steel  0.58%.  A  large  error  occurred  in  predicting  the  saturation  and  steady 
state  stresses  for  steel  with  carbon  of  0.58%  deformed  at  temperatures  of  900  and  1 1 00°C. 


- Experiment  o  900  °  1 000  ®  1100 


Strain 


Strain 


(a) 


(  b) 


Fig.  2.  EM  prediction  for  two  steels:  (a)  0.065%C  (b)  0.58%C 


As  the  peak  strain  of  those  steels  varies  with  deformation  conditions  in  a  complex  wa>{l],  the  generalised 
model  is  particularly  unable  to  precisely  predict  such  a  complexity  (Fig.  3).  For  the  steel  with  a  carbon 
content  of  0.065%,  the  prediction  of  hot  strength  at  temperatures  of  1000  and  1 100°C  and  strain  rate  of 
10s"1  is  very  accurate,  but  the  peak  strain  for  temperature  of  1100°C  is  significantly  underestimated. 
However,  although  the  model  generalisation  sacrifices  the  prediction  accuracy  for  some  conditions,  it 
provides  accurate  prediction  for  other  conditions.  For  example,  for  the  steel  with  a  carbon  content  of 
0.58%  deformed  at  a  strain  rate  of  10s"1,  the  prediction  at  temperatures  from  900  to  1100°C  gave  a  high 
accuracy  (Fig.  3(b)).  As  the  steel  of  0.58%C  is  in  the  transition  area  of  the  two  distinguishing  regimes 
where  peak  strain  changed  in  different  ways[l],  the  characteristics  of  the  constants  varying  in  either  the 
lower  carbon  or  high  carbon  steel  regions  are  not  well  developed.  It  has  been  observed  that  if  the 
generalised  model  was  used,  the  accuracy  of  the  prediction  for  steel  with  carbon  content  of  0.79%  was 
poor  and  the  peak  strain  at  lower  temperatures  was  significantly  overestimated[7]. 


- Experiment  °  900  x  1000  ®  1100 


Strain 

(  b) 


Fig.  3.  Generalised  EM  prediction:  (a)  0.065%C  (b)  0.5  8%C 


392 


Since  the  steels  of  low  carbon  content  performed  in  a  different  way  to  the  high  carbon  steels,  the  Estrin 
and  Mecking  model  can  be  used  separately  for  those  two  regimes  to  improve  the  prediction.  This  is 
particularly  encouraging  as  many  constants  inEqs.  1  and  2  have  a  similar  trend  to  that  of  peak  strain  and 
vary  in  two  regions.  The  prediction  of  those  separate  models  on  two  steels  of  0.0065%  and  0.58%  is 
shown  in  Fig.  4.  The  peak  stress  and  peak  strain  are  accurately  predicted  in  comparison  with  the 
prediction  of  the  generalised  model  for  all  steels  (Fig.  3). 


- Experiment  °  900  x  1000  «  1100 


Strain 


(a)  (  b) 

Fig.  4.  Separate  EM  prediction:  (a)  0.065%C  (b)  0.58%C 

Using  the  Estrin  and  Mecking  constitutive  model,  the  prediction  of  the  hot  strength  on  steels  with  a  wide 
range  of  carbon  content  can  be  in  different  modes.  Generally,  there  is  a  conflict  between  the  model 
accuracy  and  the  level  of  the  model  generalisation.  If  the  model  is  used  to  predicta  specific  steel,  the 
prediction  is  very  accurate.  However,  with  the  extension  of  the  model  to  predict  a  wider  range,  the 
accuracy  becomes  poorer.  Hence  the  constitutive  prediction  shows  a  similar  trend  to  the  prediction  using 
artificial  neural  network  [2]  in  that  the  more  generalised  the  model  the  less  accurate  is  the  model.  The 
prediction  can  be  described  from  Fig.  5  where  the  different  levels  of  model  generalisation  were  used.  For 
the  peak  strain  of  all  steels,  the  linear  fitting  in  all  three  cases  presented  a  very  high  error.  For  the 
generalised  Estrin  and  Mecking  model  to  predict  all  steels  with  carbon  content  from  0.0037  to  0.79%,  the 
lower  peak  strain  corresponding  to  lower  Zener-Hollomon  parameter,  Z,  was  predicted  with  a  high 
accuracy.  However,  for  the  high  Zener-Hollomon  parameter,  the  peak  strain  of  the  high  carbon  steels  was 
overestimated  while  an  underestimation  of  the  peak  strain  for  lower  carbon  steel  occurred.  If  separate 
models  were  used  for  low  and  high  carbon  steels,  higher  prediction  accuracy  was  achieved  compared  to 
the  generalised  model.  It  is  also  observed  that  the  prediction  of  the  peak  strain  on  the  high  carbon  steels 
is  more  accurate  than  steels  with  lower  carbon  content.  A  higher  accuracy  of  the  prediction  on  one 
constant  does  not  guarantee  the  accuracy  of  the  prediction  on  the  strain  stress  behaviour.  This  can  be  seen 
from  Fig.  4  which  the  inaccurate  prediction  of  the  steady  state  stress  for  both  steels  led  to  a  high 
prediction  error  in  dynamic  recrystallisation  regime  although  the  peak  strain  was  more  accurately 
predicted  than  the  generalised  model  for  all  steels. 


INTEGRATION  OF  ANN  MODEL  WITH  EM  CONSTITUTIVE  MODEL 

From  the  analyses  in  previous  section,  it  is  found  that  all  variables  need  to  be  accurately  predicted  to 
improve  the  prediction  accuracy  in  both  work  hardening  and  dynamicrecrystallisation  regions.  Although 
the  employment  of  separate  constitutive  models  (Fig.  4)  improved  the  prediction  of  constants  such  as  peak 
strain,  it  was  not  able  to  accurately  predict  the  constants  such  as  B  as  they  varied  in  a  more  complex  way 
(Fig.  6).  For  both  low  and  high  carbon  steels,  linear  fitting  of  those  constants  gave  a  high  error.  As  the 
development  of  a  universal  model  is  difficult,  the  prediction  of  the  constants  in  Estrin  and  Mecking  model 
with  other  techniques  needs  to  be  explored. 


393 


Artificial  Neural  Networks  have  successfully  been  used  to  predict  thenonlinear  relationship  between 
inputs  and  outputs.  Although  a  direct  mathematical  description  cannot  be  given,  an  ANN  model  is  able  to 
predict  complex  relations  such  as  strain  stress  behaviours  ofaustenitic  steels  [2]  if  appropriate  training 
strategies  and  training  and  test  data  set  are  employed.  To  provide  a  more  accurate  prediction  to  constants 
in  Eqs.  1  and  2,  a  multilayer  ANN perceptron  was  used.  The  supervised  feedforward  network  was  trained 
with  the  standard  backpropagation  algorithm.  In  the  three  layers  used,  the  inputs  were  carbon  content 

(C),  temperature  (T),  strain  rate  (e  ),  Zener-Hollomon  parameter  (Z)  and  activation  energy  (Qdef),  and  the 
outputs  were  the  constants  in  Eqs.  1  and  2.  One  hidden  layer  was  used  with  15  hidden  nodes. 


5 
5 

4 

5 
3 
5 
2 
5 
1 
5 
0 

12  13  14  15  16  17  18 

Log(Z) 

Fig.  5.  Linear  fitting  of  peak  strain  with  different  models  Fig.  6.  variation  of  constant  B  with  log(Z) 

The  prediction  of  the  constant  B  and  peak  strain  using  ANN  model  is  shown  in  Fig.  7.  For  constant  B, 
the  ANN  prediction  is  much  more  accurate  than  the  Estrin  and  Mecking  model  with  the  Pearson's 
correlation  coefficient  of  0.88  and  0.032  for  ANN  and  EM  model,  respectively.  As  it  was  hardly  able  to 
employ  a  mathematical  function  to  fit  the  constant  B  (Fig.  6)  then  it  is  not  supposing  that  linear  fitting 
presented  such  a  low  accuracy.  For  peak  strain  ep,  both  linear  fitting  and  ANN  prediction  presented  a 
higher  accuracy  with  the  Pearson's  correlation  coefficient  for  linear  fitting  of  0.40  and  of  0.96  for  ANN  as 
a  trend  can  be  developed  for  both  high  and  low  carbon  steels.  However,  the  prediction  of  ANN  model  is 
still  much  more  accurate  than  the  linear  fitting  (Figs.  6  and  7(b)). 


♦  Low  carbon 

□  High  carbon 

□ 

□ 

cP 

□ 

□ 

□ 

□ 

□ 

□ 

□ 

8 

□ 

□ 

• 

% 

s 

B 

□ 

□ 

□ 

□ 

□ 

□ 

% 

♦ 

♦ 

♦ 

□ 

♦ 

□ 

♦ 

□ 

□ 

□ 

♦ 

□ 

♦ 

9 

♦ 

♦ 

a 

♦ 

♦ 

♦ 

♦ 

♦ 

♦ 

Fig.  7.  ANN  prediction  of  constants:  (a)  Constant  B  (b)  Peak  strain 


394 


Strain  Strain 


(a)  (  b) 

Fig.  8.  Comparison  between  EM  prediction  with  optimum  constants  and  experimental  results: 

(a)  0.065%  (b)  0.58% 

Due  to  the  improvement  of  the  prediction  accuracy  with  the  ANN  model  on  the  constants,  the  prediction 
on  the  strain  stress  behaviour  significantly  improves  (Fig.  8).  Every  aspect  of  both  the  work  hardening 
and  dynamic  recyrstallisation  regimes  is  accurately  presented.  Unlike  the  linear  fitting  of  the  constants  in 
Estrin  and  Mecking  model,  the  ANN  provides  precise  information  on  peak  strain,  saturation  stress  and 
steady  state  stress  for  low  and  high  carbon  steels  which  leads  to  the  accurate  prediction  of  the  hot  strength. 


CONCLUSION 

The  strain  stress  behaviour  of  the  austenitic  steels  with  carbon  content  ranging  from  0.0037  to  0.79%  was 
studied  using  Estrin  and  Mecking  unified  constitutive  model.  Due  the  complex  influence  of  the  chemical 
composition,  the  constitutive  model  was  not  able  to  accurately  predict  the  behaviour  in  both  the  work 
hardening  and  dynamic  recrystallisation  regions.  With  the  incorporation  of  the  ANN  model  into  a 
modified  Estrin-Mecking  phenomenological  model  to  predict  the  variables  in  the  constitutive  model,  both 
the  work  hardening  and  dynamic  recrystallisation  regimes  were  accurately  predicted.  As  the  integrated 
constitutive  ANN  model  also  accurately  predicts  the  materialproperties  [2],  the  integration  of  constitutive 
and  artificial  neural  network  models  provides  an  excellent  approach  to  enhance  the  capability  of  the 
constitutive  model. 


REFERENCES 

1.  D.  C.  Collinson,  P.  D.  Hodgson,  B.  A.  Parker,  1993.  Modelling  the  Effect  of  Carbon  Content  on  the 
Hot  Strength  of  Steel.  Modelling  of  metal  rolling  processes,  pp.  283-295,  London. 

2.  L.  X.  Kong,  P.  D.  Hodgson,  D.  C.  Collinson,.  1998.  Modelling  the  Effect  of  Carbon  Content  on  Hot 
Strength  of  Steels  Using  a  Modified  Artificial  Neural  Network  1S1J  International,  38,  1 121-1 129. 

3.  L.  X.  Kong,  P.  D.  Hodgson,  D.  C.  Collinson,.  1998.  Application  of  an  Integrated  Phenomenological 
Neural  Network  Model  to  Extrapolative ly  Predict  the  Hot  Strength  of  Austenitic  Steels.  Steel 
Rolling'98,  Tokyo,  pp.540-545. 

4.  M.  Avrami.  1939.  Kinetics  of  phase  change.  Pt.I:  general  theory.  J.  Chemical  Physics,  7,  1 103-1 1 12. 

5.  Y.  Estrin  and  H.  Mecking.  1984.  A  unified  phenomenological  description  of  work  hardening  and 
creep  based  on  one-parameter  models.  Acta  Metall.,  vol.  32,  pp.  57-70. 

6.  P.  D.  Hodgson,  L.  X.  Kong,  and  C.  H.  J.  Davies,.  1999.  Prediction  of  the  Hot  Strength  in  Steels  with 
an  Integrated  Phenomenological  and  Neural  Networks  Model.  J.  Ma..  Proc.  Techn.,  87,  132-139. 

7.  L.  X.  Kong  and  P.  D.  Hodgson.  1999.  Constitutive  Analysis  of  Flow  Strength  of  Austenitic  Steels  with 
the  Integration  of  ANN  models.  8th  Inter.  Conf.  on  Mech.  Behaviour  of  Materials ,  Canada. 


395 


Hybrid  Fuzzy  Modelling  Using  Simulated  Annealing  and 
Application  to  Materials  Property  Prediction 

Min-You  Chen  and  D.  A.  Linkens 

Department  of  Automatic  Control  and  Systems  Engineering 
The  University  of  Sheffield,  UK 
Email:  Minvou.Chen@shef.ac.uk  D.Linkens@shef.ac.uk 


ABSTRACT 

This  paper  proposes  a  hybrid  fuzzy  modelling  approach  using  a  self-organising  network  and  simulated 
annealing  algorithm  for  self-constructing  and  optimising  fuzzy  rule-based  models.  The  proposed  fuzzy 
modelling  procedure  consists  of  two  stages.  Firstly,  a  fuzzy  competitive  neural  network  is  exploited  as  a 
data  pre-processor  to  extract  a  number  of  clusters  which  can  be  viewed  as  an  initial  fuzzy  model  from 
engineering  data.  This  step  is  used  to  perform  fuzzy  classification  with  the  objective  of  obtaining  a  self¬ 
generating  fuzzy  rule  base.  Secondly,  simulated  annealing  (SA),  a  combinatorial  optimisation  technique,  is 
used  to  optimise  the  fuzzy  membership  functions.  The  application  of  this  approach  to  the  mechanical 
property  prediction  for  C-Mn-Nb  steels  is  given  as  an  illustrative  example. 


INTRODUCTION 

Fuzzy  modelling  has  become  an  important  subject  in  engineering  because  the  if-then  rule  mechanism  is 
easy  to  manipulate,  understand  and,  to  a  certain  extent,  is  domain  independent.  Also,  the  rapid  development 
of  hybrid  approaches  based  on  fuzzy  logic,  neural  networks  and  genetic  algorithms  enhances  the  fuzzy 
modelling  technology  significantly.  A  variety  of  different  fuzzy  modelling  approaches  have  been  developed 
and  applied  in  system  identification,  control,  prediction  and  pattern  recognition[l-5].  However,  many 
existing  methods  using  gradient-descent  learning  to  design  and  optimise  the  fuzzy  rule-based  model 
encounter  some  problems  such  as  local  minima,  slow  learning  speed  and  requirement  of  large  memory  as 
they  are  applied  to  engineering  practice.  In  material  engineering,  it  is  important  to  establish  an  appropriate 
composition-microstructure-property  model  for  materials  development. 

Much  research  has  been  done  on  developing  models  which  will  predict  the  final  properties  of  the  steel  after 
rolling  and  cooling  to  room  temperature  as  a  function  of  the  processing  conditions  and  steel  composition[6- 
8],  This  kind  of  model  usually  consists  of  quite  a  number  of  input  variables,  such  as  the  contents  of 
different  kind  of  chemical  elements,  microstructural  parameters  and  process  variables.  Commonly  used 
neural  network  models  and  neuro-fuzzy  models  based  on  gradient-descent  learning  algorithms  may  become 
trapped  in  local  minima  and  they  are  computational  inefficient  for  multivariable  approximation  problems. 

To  remedy  the  above  problems,  this  paper  proposes  a  hybrid  fuzzy-  modelling  approach  using  a  self- 
organising  network  and  simulated  annealing  algorithm  to  self-construct  and  optimise  the  fuzzy-rule-base 
and  increase  the  computational  efficiency  of  the  modelling  process.  The  proposed  fuzzy  model  is  built  in 
two  stages.  First,  a  competitive  neural  network  is  exploited  to  extract  a  number  of  clusters  which  can  be 
viewed  as  an  initial  fuzzy  model  from  engineering  data.  This  step  performs  fuzzy  classification  with  the 
object  to  obtain  a  self-generated  fuzzy  rule  base.  Domain  knowledge  is  used  for  structure  determination, 
i.e.,  to  determine  relevant  inputs,  number  of  membership  functions  for  each  input,  number  of  rules,  types  of 
fuzzy  models,  etc.).  Second,  simulated  annealing  (SA),  a  combinatorial  optimisation  technique,  is  used  to 
optimise  the  parameters  in  the  antecedent  and  consequent  parts  of  the  fuzzy  rules.  The  proposed  approach 
allows  construction  of  a  mechanical  property  prediction  model  for  structural  steels. 

GENERATING  THE  FUZZY  RULE-BASED  MODEL 

A  fuzzy  reasoning  model  is  considered  as  a  set  of  rules  in  an  IF-THEN  form  to  describe  the  I/O  relationship 
of  a  complex  system.  Consider  a  collection  of  N  data  points  {P\,  Pi,  -,  Pn}  in  a  n+1  dimensional  space 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


396 


combining  both  input  and  output  dimensions.  Without  loss  of  generality,  a  multi-input  and  single-output 
(M1SO)  model  is  used  as  a  generic  representation  of  a  fuzzy  system.  Thus,  the  I/O  data  pair  can  be 
represented  as: 

Rk  “  (A i k ■  Xlk*  Xnk  3  .t'/.),  Pk  G  P  ,  h  1 ,2, ...,  N. 


Let  a:  =  (xi,  x2,  ...  ,  x„)G  R",  be  inputs  and  y  E  R  be  the  output.  The  modelling  problem  is  to  identify  the 
non-linear  function  y  =  fix)\  with  the  given  N  input/output  data  pairs.  A  generic  fuzzy  model  is 

presented  as  a  collection  of  fuzzy  rules  in  the  following  form: 

/?,:  IF  X\  is  A\j  and  x2  is  A2i  ...  and  x„  is  Ani  THEN  yis^x) 


where  x  =  (xitx2, ... ,  x„)  and  y  are  linguistic  input  and  output  variables  respectively,^-,  are  fuzzy  sets  of  the 
universes  of  discourse  of  Uj  E  R  (/'=  1,2,. and  R,  represents  the  rth  rule,  i=\,2,...,p.  Typically, /{a)  takes 
one  of  the  following  three  forms:  singleton,  fuzzy  set  or  linear  function  of  the  input  variables.  To  obtain  the 
fuzzy  rules  requires  defining  the  types  and  parameters  of  the  membership  functions  for  all  fuzzy  sets. 
Specifically,  we  chose  singleton  consequences  and  a  Gaussian  function  as  the  form  of  the  membership 
functions.  Fuzzy  logic  systems  with  centre-average  defuzzification,  product-inference  rule  and  singleton 
fuzzifier  are  of  the  following  form: 


>=x//(nM*/))  /  xcriM*;)) 

>'= 1  7=1  /  M  7=1 


1. 


where  ft. (a)  denotes  the  membership  function  of  a,  belonging  to  the  rth  rule,  i.e., 

Ify  (*, )  =  exp[-(— — ~)2  ]  2. 

where  Gy  and  ay  are  the  width  and  centre  of  the  fuzzy  membership  function  p,y(x,)  respectively. 


Creating  the  initial  fuzzy  model  is  a  clustering  process  which  groups  the  data  scattered  in  space/?"*7  into  a 
collection  of  clusters.  Since  the  goal  is  to  minimise  the  objective  function  and  the  centres  and  widths  are 
adjustable  later  in  the  parameter-learning  phase,  it  is  unnecessary  to  spend  much  time  to  assign  centres  and 
widths  for  the  perfect  cluster.  Hence,  we  simply  use  a  competitive  learning  network  [9]  to  produce  the 
clusters.  The  purpose  of  this  stage  is  to  classify  the  given  training  data  into  a  small  number,  say/?  «  N, 
clusters  using  competitive  learning.  The  network  classifies  vectors  into  one  of  the  specified  number  of p 
categories  according  to  the  clusters  detected  in  the  training  data  set.  The  training  is  performed  in  an 
unsupervised  mode,  and  the  network  undergoes  a  self-organising  process.  During  training,  dissimilar 
vectors  are  rejected,  and  only  the  most  similar,  is  accepted  for  weight-building.  The  procedure  of  rule 
generation  via  competitive  learning  is  as  follows. 


Assume  that  the  input  vector  is  x  =  (xi,  x2,  ...,  x„)',  and  we  have  a  set  of  training  data  {xh  x2,  ...,  xw}.  The 
learning  algorithm  treats  the  set  of p  weight  vectors  as  variable  vectors  that  need  to  be  learned.  The  weight 
adjustment  criterion  for  this  mode  of  training  is  the  selection  of  w .  such  that: 


x-wj|=  min  {|| x  —  w.  ||} 

i=l,2 


3. 


The  index  m  denotes  the  winning  neuron  number  corresponding  to  the  vector  Wm ,  which  is  the  closest 
approximation  to  the  current  input  x.  Note  that  the  left  side  of  Equation  3.  can  be  rearranged  as: 


|x- Wj|=(x'x-2w;„x-  $ 


1/2 


4. 


It  is  obvious  from  Equation  4.  that  searching  for  the  minimum  of  p  distances  as  on  the  right  side  of  the 
equation  corresponds  to  finding  the  maximum  among  the p  scalar  products: 


w'x=max(w'x)  5. 

/  =1,2, 

The  left  side  of  Equation  5.  is  the  activation  value  of  the  "winning"  neuron.  After  the  winning  neuron  has 
been  identified  and  declared  a  winner,  its  weights  must  be  adjusted  so  that  the  distance  calculated  in 
Equation  3.  is  reduced  in  the  current  training  step.  It  seems  reasonable  to  reward  the  weights  of  the  winning 


397 


neuron  with  an  incremental  change  in  weight  in  the  negative  gradient  direction.  Thus  we  have: 

Vw,„  ||x-wj2=-2(x-w,„)  6. 

AWm=cc(x-Wm)  7. 

where  a  6  (0,1)  is  the  learning  rate  selected  heuristically.  The  remaining  weights  w',  i&m,  remain 
unaffected.  The  learning  rule  in  Equation  7.  in  the  Mi  step  is  rewritten  in  a  more  formal  way  as  follows: 


/V  i-4-1  I-  ,  A.  b  . 

w„,  =w,„+a(x-wj 

8.a. 

W*+I  =  wf ,  for  / ^  m 

8.b. 

Learning  according  to  Equations  7.  and  8.  is  called  the  " winner-take-all "  method  —  a  common  competitive 
and  unsupervised  learning  technique.  As  learning  continues  and  clusters  develop,  the  network  weights 
acquire  similarity  to  input  data  within  clusters.  In  contrast  to  the  standard  competitive  learning  algorithm, 
we  define  the  activation  value  of  an  output  node  as: 

0f=w,;  t'=l,  2 . p, 

where  W/=  (w;1,  wa, ...,  win)T  represents  the  prototype  of  the  z'th  fuzzy  cluster  in  I/O  space. 

After  competitive  learning,  the  produced  p  nodes  in  the  competitive  layer  can  be  viewed  as  p  data  clusters 
centred  at  W=  {w\,W2,—,wp}.  Each  cluster  centre  w,=  (w,,,  w, 2,  ...,  w„„  w,;„- 0  is  in  essence,  a  prototypical 
data  point  that  exemplifies  a  characteristic  I/O  behaviour  of  the  system  we  wish  to  model.  Hence  each 
cluster  centre  can  be  used  as  the  basis  of  a  rule  that  describes  the  system  behaviour.  Each  vector  w-,  can  be 
decomposed  into  two  component  vectors  x*  and  y*  The  cluster  centre  vector  can  be  denoted  as:  h>,=(x; 

y‘),  where  x,N(x*  ,X*2,---,X*  )=(w,,,  wa,  ...,  w,„),  y'=w,-„+h  Thus,  each  cluster  centre  wr(x]  y’  )  can 
be  viewed  as  a  fuzzy  rule  that  describes  the  system’s  local  behaviour.  Intuitively,  cluster  centre  re, 
represents  the  rule  "IF  input  is  around  X*  THEN  output  is  around  y*”.  Hence,  the  initial  rule-base 

consisting  of  p  rules  is  created  by  this  competitive  learning.  The  deviation  parameter  of  cluster/,  C/=  a,,-,  is 
selected  by  using  the  average  distance  to  the  nearest  m-cluster  centres: 

m 

»kImni/!  9- 

/=> 

where  Cj  is  the  centre  of  the  y'th  cluster  nearest  to  cluster  i.  The  obtained  p  prototypes  are  used  to  construct 
the  parameters  of  the  fuzzy  rule-base.  So,  the  rule-base,  composed  ofc  fuzzy  rules,  is  represented  as: 

Rf.  IF  xi  is  Aji  and  x2  is  Ap ...  and  xn  is  AJn  THEN  yisfi 

where  Rj  denotes  the y'th  rule,  y'=l,2,...,  c;  Ajt  is  the  fuzzy  set  defined  by  the  Gaussian  membership  function 

n 

centred  at  ay,  and  fj  =  ^ ,b.X:  is  the  y'th  rule  output  with  respect  to  the  TSK  model,  where  x0=l .  Thus,  the 

;=o 

fuzzy  inference  model  shown  in  Equation  1.  is  obtained: 


OPTIMISING  THE  FUZZY  MODEL  USING  SIMULATED  ANNEALING 

After  structure  identification,  the  fuzzy  rule-based  model  must  be  trained  for  model  optimisation  via 
parameter-learning.  The  Simulated  Annealing  (SA)  algorithm  [10]  is  introduced  as  a  combinatorial 
optimisation  technique  to  find  the  optimal  parameters  of  the  membership  functions  in  the  fuzzy  rules.  SA 
comes  from  the  similarity  between  mathematical  optimisation  procedures  and  the  way  a  metal  cools  into  a 
minimum  energy  crystalline  microstructure  called  annealing.  The  optimisation  process  starts  with  an  initial 
set  of  membership  function  parameters.  The  fuzzy  model  corresponding  to  this  set  of  parameters  is  used  in 


398 


a  simulation  for  the  test  disturbance  signal  to  evaluate  the  cost  for  these  parameters.  Then  a  new  solution  is 
selected  randomly  on  a  hyper-sphere  with  a  certain  radius  around  the  previous  solution.  The  cost  for  the 
new  solution  is  evaluated  in  a  similar  fashion,  and  the  costs  of  each  solutions  are  compared.  Optimal 
seeking  starts  at  high  temperatures  resulting  in  acceptance  of  both  cost-increasing  and  cost-decreasing 
solutions.  The  search  is  terminated  when  a  certain  degree  of  improvement  is  achieved  in  the  cost  function, 
or  ended  at  a  pre-set  lower  temperature.  The  SA  method  has  a  major  advantage  over  other  optimisation 
methods  in  its  ability  to  avoid  local  minima.  However,  the  standard  SA  algorithm  suffers  from  very  slow 
convergent  process  due  to  its  slow  cooling.  To  improve  the  computational  efficiency,  the  fast  SA  approach 
proposed  by  Kirkpatrick  [11]  was  adopted.  The  training  procedure  using  the  SA  algorithm  is  as  follows. 

Step  1.  Initialisation 

Set  up  the  initial  temperature  T0,  minimum  temperature  Tmin,  and  cooling  rate  (f.  Define  the  cost 
function  E(x)  as  the  Root  Mean  Squared  Error  (RMSE)  of  the  model  output.  The  SA  algorithm  starts 
with  the  initial  state  vector  xfl,  produced  by  the  competitive  network.  Set  the  iteration  number  k  =  1 . 

Step  2.  Generation  of  a  new  solution 

A  new  solution  based  on  the  existing  solution  is  generated  as  the  following  formula: 
x(k+ 1 )  =  x(k)  +  y  D 

where  y  is  a  random  vector  whose  elements  are  in  the  range  [-1  to  +1],  and  D  is  a  diagonal  matrix  that 
represents  the  maximum  allowable  change  in  search  parameters. 

Step  3.  Make  decision  of  acceptance 

Calculate  the  cost  function  E(x(k)),  E(x(k+\  j),  and  the  acceptance  probability  P  =  min{\,  exp(-A£/7)}, 
where  A E  =  E(x(k+ 1  j)-E(x(k)),  T  is  the  current  temperature.  If  P  >  X,  where  A.G  [0,  1]  is  a  random 
number,  then  go  to  Step  4,  otherwise  back  to  Step  2. 

Obviously,  the  method  not  only  accepts  solutions  that  decrease  the  cost  but  also  accepts  solutions  that 
increase  the  cost  with  a  probability  given  byP  =  exp(-A£/7). 

Step  4.  Temperature  scheduling 

Decrease  the  temperature  using  the  following  expression:  T(k+\  )  =  p  T(k); 

Step  5.  Termination  judgement. 

If  T(k)  <  Tmm,  or  E(x(k))  <  e  ;  Then  stop  the  search;  otherwise  repeat  from  Step  2 


APPLICATION  TO  MATERIAL  PROPERTY  PREDICTION 

The  problem  with  modelling  hot-rolled  metal  materials  can  be  stated  broadly  as:  given  a  certain  material 
which  undergoes  a  specified  heat  treatment  process,  what  are  the  final  properties  of  this  material?  Typical 
final  properties  of  interest  are  mechanical  properties  such  as  tensile  strength,  yield  strength,  elongation, 
etc..  A  trial-and-error  approach  to  solve  this  problem  is  often  taken  in  the  materials  industry,  with  many 
different  hot  working  conditions  attempted  to  achieve  a  given  final  product.  The  obvious  drawbacks  of  this 
approach  are  large  time  and  financial  costs  and  a  lack  of  reliable  predictive  capability.  By  using  the 
proposed  hybrid  fuzzy-modelling  approach,  we  have  developed  composition-microstructure-property 
models  for  hot  rolled  C-Mn-Nb  steels.  The  experimental  data  of  more  than  300  different  steels  with 
normalized  heat  treatment  were  used  to  train  and  test  the  fuzzy  models  which  relates  composition  and 
microstructure  to  mechanical  properties.  Simulation  for  two  typical  prediction  models  are  given  as  follows: 

Structure-Property  Model 

In  this  model,  Grain  size  (D'1/2),  Pearlite(%)  and  the  percentage  content  of  Nb  (Nb%)  are  chosen  as  the 
model  inputs,  and  Ultimate  Tensile  Strength  (UTS)  is  the  model  output.  340  experimental  data  records  of 
different  steels  are  used  for  modelling.  Half  of  the  data  set  was  used  for  training,  the  other  half  for  testing. 
Through  self-organised  learning  and  parameter  optimisation  using  the  SA  algorithm,  a  2-rule  fuzzy  model 
was  developed  as  shown  in  Figure  1 . 


399 


Nb%  D'1/2  Pearlite 


Then  UTS=325.40+48.08Nb%+39.17D1/2+26.26Pearlite 


Then  UTS=502.92+75.93Nb%+6. 1 8D  l/2-8.92Pearlite 


Fig.  1.  The  final  fuzzy  rule-based  model  for  property  prediction 

The  simulation  results  with  6%  average  prediction  error  of  the  UTS  from  the  trained  fuzzy  model  is  shown 
in  Figure  2(a)  and  (b).  The  predicted  UTS  versus  measured  UTS  is  plotted  in  Fig.2(c).  The  graph  shows 
good  agreement  between  measured  and  model-predicted  values. 


MPa  Prediction  of  Ultimate  tensile  strength 


Measured  UTS 

Fig.  2.  Simulation  results  of  the  structure-property  model. 

in  (a)  and  (b) :  solid  line:  measured  UTS.  dotted  line:  predicted  UTS 

Composition-Microstructure-Property  Model 

The  inputs  to  this  model  include  the  main  chemical  composition  C%,  Si%,  Mn%,  N%,  Nb%  and  grain  size 
D  ,/2  .  The  model  output  is  Lower  Yield  Strength  (LYS).  Using  the  proposed  hybrid  fuzzy-modelling 
approach,  a  final  fuzzy  model  with  2  rules  was  constructed.  The  simulation  results  with  5.3%  average 
prediction  error  is  shown  in  Figure  3.  A  direct  comparison  of  the  measured  LYS  values  versus  predicted 
values  over  a  wide  range  of  samples  is  shown  in  Figure  3(c).  The  scatter  of  the  points  indicates  that  the 
predictions  and  generality  of  the  obtained  fuzzy  model  are  good. 


400 


MPa  Prediction  of  Lower  yield  strength 


as 

>  200 


100  150  200  250  300  350  400  450  500  550  600 

Measured  LYS 


Fig.  3.  Simulation  results  of  the  composition-microstructure-property  model, 
in  (a)  and  (b) :  solid  line:  measured  UTS.  dotted  line:  predicted  UTS 


(a) 


(b) 


(c) 


DISCUSSION  AND  CONCLUSION 

A  hybrid  fuzzy-modelling  approach  using  simulated  annealing  has  been  presented  and  applied  to  the  task  of 
predicting  material  properties.  The  main  characteristics  of  this  modelling  approach  are:  (1)  the  fuzzy  rule- 
base  can  be  generated  and  optimised  automatically  from  training  data  through  a  proposed  hybrid  fuzzy 
modelling  procedure;  (2)  by  using  simulated  annealing,  the  local  minima  problem  is  averted,  thus  global 
optimisation  of  the  membership  functions  is  possible;  (3)  the  acquired  fuzzy  model  has  a  simple  structure 
and  is  fast  to  compute.  Simulation  shows  predicted  mechanical  properties  agree  well  with  experimental 
data  by  using  an  optimised  two-rule  model.  This  work  presents  early  work  in  development  of  fuzzy  models 
for  materials  property  prediction.  Future  work  will  be  done  to  improve  modelling  accuracy,  incorporate 
linguistic  information  with  numerical  data,  and  apply  to  microstructure  modelling  and  property  prediction. 


REFERENCES 

1 .  M.  Sugeno,  G.T.  Kang,  1988.  Structure  identification  of  fuzzy  model.  Fuzzy  Sets  &  Sys.,  28,  1 5-33. 

2.  M.  Sugeno,  T.  Yasukawa,  1993.  A  fuzzy-logic-based  approach  to  qualitative  modeling.  IEEE  Trans. 
Fuzzy  Systems,  1(1),  7-31. 

3.  L.X.  Wang,  1994.  Modelling  and  control  of  hierarchical  systems  with  fuzzy  systems.  Automatica, 

33(6),  1041-1053. 

4.  J.R.  Jang,  1992.  Self-learning  fuzzy  controllers  based  on  temporal  back  propagation.  IEEE  Trans,  on 
Neural  Networks,  3(5),  714-723. 

5.  S.  Marsili-Libelli,  A.  Muller,  1996.  Adaptive  fuzzy  pattern  recognition  in  the  anaerobic  digestion 
process.  Pattern  Recognition  Letters,  1 7(6),  65 1  -659. 

6.  P.D.  Hodgson,  1996.  Microstructure  modeling  for  property  prediction  and  control.  J.  Mat.  Proc.  Tech.,  60, 27-33. 

7.  C.  Chen,  Y.  Cao,  S.R.  LeClair,  1998.  Materials  structure-property  prediction  using  a  self-architecturing 
neural  network.  J.  Alloys  and  Compounds,  279(1),  30-38. 

8.  C.A.L.  Bailer-Jones,  T.J.  Sabin,  D.J.  Mackay,  P.J.  Withers,  1998.  Prediction  of  deformed  annealed 
microstructures  using  Bayesian  networks  and  Gaussian  processes.  Proc.  Int.  Conf.  on  Forging  and 
Related  Technology,  Birmingham,  U.K.,  913-919. 

9.  T.  Kohonen,  1984.  Self-organization  and  associative  memory,  Springer-Verlag,  Berlin. 

lO.S.Kirkpatrck,  C.D.  Gelatt,  M.P.  Vecchi,  1983.  Optimization  by  simulated  annealing,  Sci.  220, 671-680. 

1 1 .  S.Kirkpatrck,  1984.  Optimization  by  simulated  annealing:  quantitative  studies,  J.  Stat.  Phys.,  34, 975-986. 


401 


Intelligence  in  Materials  Science  I 


402 


403 


Inorganic  Glasses:  Old  and  New  Structures 
on  the  Eve  of  the  21st  Century 

J.  Sestak  *  ,  B.  Hlavacek  *  and  N.  Koga  * 

*  Institute  of  Physics,  Czech  Academy  of  Sciences,  Praha,  Czech  Republic 
+  Dept,  of  Polymers,  University  of  Pardubice,  Pardubice,  Czech  Republic 
u  Chemistry  Laboratory,  Faculty  School  Education,  Hiroshima  University, 

Higashi-Hiroshima,  Japan 


ABSTRACT 

History  has  shown  that  glass  is  a  remarkable  nonciystalline  substance,  usually  made  naturally  or  artificially 
from  the  simplest  raw  materials.  Mimicking  evolution  however,  mankind  has  been  responsible  for  the  creation 
of  new  families  of  a  wide  variety  of  glasses  which  gradually  appeared  through  creative-thinking  particularly 
during  last  hundred  years.  The  process  of  rapid  extraction  of  heat  turned  out  to  be  successful  in  providing 
quenching  treatments  to  assist  physicists  in  preparing  glassy  states  from  different  types  of  materials  (metals) 
in  contradiction  to  the  previously  traditional  chemical  approach  which  sought  an  appropriate  composition  to 
vitrify  under  self-cooling  (silicates).  The  most  discussed  issue  is  the  thermodynamic  stability  of  the  glassy 
state  as  a  special  form  of  matter  with  its  low-dimensional  organisational  structure,  as  well  as  its 
classification  within  the  hierarchy  level  of  noncrystalline  solids.  In  this  respect  the  most  important  is 
entropy.  We  can  say  that  the  major  part  of  the  entropy  under  Vogett  temperature,  Tv  ,  has  its  origin  in  the 
thermal  entropy  contribution,  W*.  When  the  temperature  becomes  higher  than  Tv  ,  the  configurational  part 
of  the  entropy,  Wcf ,  starts  to  play  a  role.  This  Wcf  part  is  mainly  connected  to  the  micro-configurational 
displacements  of  particles.  At  and  above,  the  glass  transition  temperature,  Tg  ,  the  conformational  part  of 
entropy,  Wconf ,  which  is  connected  to  the  displacements  of  particles  through  diffusion  in  the  macro-sample 
is  involved.  It  seems  that  liquids  above  the  Tg  transition  are  formed  by  two  mechanically  distinct  "species". 
Under  the  Tg  temperature,  a  matrix  system  is  formed,  in  vast  majority,  by  particles  excited  just  to  the  lower 
level  of  the  amplitude  of  an  anharmonic  oscillator.  Above  Tg  ,  the  second  "species"  starts  to  appear  which 
is  formed  by  thermally-excited  particles  able  to  overcome  viscous  and  elastic  forces  of  the  matrix  in  their 
vicinity  and  bring  the  particles,  through  thermal  excitement  and  interactions  within  their  vicinity,  to  the 
upper  amplitude  levels  of  a  non-linear  oscillator.  The  thermally  excited  particles  thus  form  the  active  and 
ephemeral  vacancy  spaces.  These  vacancies  have  very  high  expansion  coefficient  and  are  responsible  for 
high  expansion  coefficient  of  liquids  in  general. 


INTRODUCTION:  A  SHORT  HISTORY  OF  GLASS  MAKING 

Advancement  of  tailored  material  engineering  involves  exploitation  of  relationships  among  the  four  basic 
elements:  structure  and  composition;  properties;  performance;  synthesis  and  processing.  A  common  element 
that  links  the  great  diversity  of  work  in  materials  science  of  both  inorganic  and  organic  engineering  is  thus 
the  controlled  combination  of  atoms  and  molecules  in  large  segregation  in  ways  that  endow  resulting 
compounds  with  desirable  properties.  This  depends  not  only  on  the  chemical  nature  of  atomic  and 
molecular  constituents  but  also  on  the  degree  of  their  interactions,  organisation  and  freeze-in  phenomena 
with  the  greatest  flexibility  in  the  field  of  glasses.  This  depends  on  reshaping  the  concept  of  what  material 
science  is  and  what  role  it  plays  in  analysing  modem  ordering  and/or  disordering  phenomena  functional  in 
rigid  states  under  our  observation.  The  formation  of  new  states  long  customary  in  nature  but  also  aided  by 
intelligent  creatures  is  also  of  importance.  In  this  respect,  glass  is  a  remarkable  noncrystalline  [1,2]  substance, 
usually  made  from  the  simplest  of  raw  materials.  Mimicking  evolution,  however,  Man  has  become  responsible 
for  the  creation  of  a  whole  new  family  of  a  wide  variety  of  glasses  which  gradually  appeared  through  human 
creativity  particularly  during  last  hundred  years  where  he  process  of  rapid  extraction  of  heat  turned  out  to  be 
responsible  for  successful  quenching  treatments  to  assist  physicists  in  preparing  glassy  states  from  different 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


404 


sorts  of  materials  in  contradiction  to  the  previously  traditional  chemical  approach  which  sought  an 
appropriate  composition  to  vitrify  under  self-cooling. 

The  first  natural  glasses  were  formed  as  the  earth  cooled  and  therefore,  they  pre-date  the  creation  of  living 
organisms  by  about  1.5  billion  years.  Such  primordial  glasses  were  limited  in  composition  and  versatility  as 
were  the  first  primitive  unicellular  organisms  (bacteria),  however,  some  compositions  (in  an  unstable  state  of 
glass)  have  survived  unchanged  for  enormously  long  periods  (similarly  to  certain  strains  of  bacteria).  The 
diversity  of  glasses  has  been  known  since  humans  learned  how  to  control  fire,  roughly  a  few  hundred  thousand 
years  ago.  The  first  man-made  glasses  were  synthesized  unintentionally  by  the  fortuitous  smelting  of  sand  and 
alkaline  plant  flux  by  fire  about  ten  thousand  years  ago.  Some  glasses  were  created  by  accidental  action  of 
spark  and,  particularly,  tektites  (natural  acidic  silicate  glasses  [3]  with  a  high  melting  point),  were  formed  by 
terrestrial  impact  and  volcanism.  They  have  attracted  the  attention  of  men  since  prehistoric  times,  having  been 
used  as  cutting  instruments,  amulets  or  cult  objects.  It  is  also  worth  noting  that  the  extent  of  natural  glass  on 
earth  is  in  the  range  of  a  tenth  of  a  percent  with  a  ratio  of  about  3450  minerals  versus  5  types  of  natural  glass 
while  on  the  moon  (and  possibly  also  other  planets)  it  is  possible  to  find  merely  60  minerals  against  35  glasses. 
So,  the  frequency  of  glass  deposits  is  at  least  one  order  of  magnitude  higher  than  that  on  Earth. 

Several  notable  milestones  of  modem  science  depended  on  the  availability  of  new  glass  as  a  preeminent  choice 
of  alchemist  for  their  apparatus.  Relatively  unstable  and  fragile  glass  was  always  essential  for  many  chemical 
operations  in  early  times.  Dissatisfied  with  the  chemical  durability  of  glass,  glass  properties  were  modified  by 
adjusting  composition;  however,  this  could  not  yet  have  been  done  properly  as  chemistry  was  still  on  a  mystical 
basis,  and  the  techniques  of  analysis  did  not  exist.  An  important  cornerstone  was  Galileo's  work  on  the  motion 
of  planets  based  on  glass  lenses  in  astronomical  telescopes,  as  well  as  Newton's  pioneering  work  in  optics 
(1666)  requiring  prisms  and  mirrors.  Other  basic  investigations  required  specific  glass  apparatus  to  describe  the 
properties  of  gases,  to  introduce  thermometry,  barometry  and  to  develop  microscopy.  The  first  reasonably 
documented  description  of  glass-making  procedures  is  associated  with  the  invention  of  lead  glass  around  1676. 
The  most  influential  books  appear  to  be  Nerd's  "  L'Arte  Vetraria"  (1612)  and  Kuncel's  "Ars  Vitraria 
Experimentalis"  (1679)  which  were  translated  into  other  languages  (as  well  as  many  others  such  as  the 
Encyclopedia  in  1765,  etc.)  all  of  them  remained  simple  recipe  manuals.  In  the  middle  of  the  seventeenth 
century,  a  proper  understanding  of  heat  was  yet  lacking,  recognizing  only  three  degrees  of  heat  (calor,  fervor, 
and  ardor)  and  cold  (frigus,  algor  and  one  unnamed)  sensed  more  as  a  kind  of  chemical  element  and  even 
treated  with  a  negative  mass.  From  the  Bohemian  Comenius  (-1592)  until  the  Scottish  Black  (-1728), 
temperature  and  heat  were  not  distinguished,  so  that  melting,  solidification  and  glass  formation  could  not  be 
understood  properly.  The  only  important  glass  properties  easily  measured  were  density  and  refractive  index. 

Since  medieval  times,  manual  skill  allowed  the  making  of  window  glass  by  the  crown  process  (forming  a 
shallow  bowl  and,  after  reheating,  spinning  to  make  it  open  up  into  an  almost-flat  circular  disc)  and  by  the 
cylinder  process  (blowing  a  cylinder,  cutting  off  the  ends  and  cracking  it  longitudinally)  often  mentioned  as 
"procede  de  Boheme".  The  nineteenth  century  showed  an  enormous  progress  in  optical  glasses  [4]  through 
effective  stirring  and  later  by  fruitful  investigation  of  property  versus  composition  relationships.  The  tank 
furnace  made  possible  continuous  large  scale  production,  as  well  as,  machines  for  automated  production  of 
containers  which  essentially  revolutionized  the  glass  industry.  By  the  turn  of  the  century,  Owens  produced  a 
successful  six  arm  rotary  machine  which  differed  from  most  others  in  the  way  it  was  supplied  with  glass,  and  it 
remained  almost  unchanged  until  the  1960s  when  it  was  supplanted  by  gob  feeders.  Patents  for  sheet  glass 
production  date  back  to  the  1850s  but  with  little  success.  The  invention  of  mechanically-drawing  sheet  glass 
was  made  by  the  Belgian,  Fourcault,  in  1914,  but  was  not  finalized  due  to  the  war.  The  original  serial  pulling 
was  later  improved  by  a  system  called  "Bohemian  cross",  but  the  most  important  large  scale  production  took 
place  in  the  1950s  by  a  rather  expensive  redundant  plate  process. 

The  production  of  foam  glass  and  fused  basalts  is  also  worthy  of  mention  in  the  same  period.  The  large  scale 
production  of  glass  fibers  for  insulation  and  for  textiles  was  another  important  twentieth  century  development 
as  well  as  less  known  advances  in  high  quality,  dimensionally-accurate  pressing  or  optical  fibers,  gradient 
index  glasses  ,  etc.  One  of  the  most  significant  scientific  achievements  was  Graffiti ?>  theory  of  the  strength  of 
brittle  materials  (1920).  X-ray  diffraction  analysis  was  a  particularly  exciting  field  having  enormous  impact  on 
glass  science  in  the  first  quarter  of  the  century.  It  led  Zachariesen  to  consider  his  principles  on  how  bonding 


405 


requirements  were  met  and  nearest  neighbor  coordination  maintained  without  imposing  an  exact  long  range 
order. 

Since  the  thirties  conferences  on  oxide  glasses  started  their  regular  series  followed  by  a  search  for  novel  glasses 
with  properties  not  previously  known  or  studied.  This  resulted  in  introduction  of  novel  families  of 
unconventional  glasses;  such  as  the  non-oxide  glasses  of  chalcogenides  which  exhibited  many  general  features 
shared  with  oxide  glasses.  Serial  conferences  began  to  take  place  in  the  fifties  and  the  most  widespread  ones 
came  later  with  the  development  of  xerography,  electropholarography  and  lithography.  Afterwards  there 
appeared  quite  unexpected  inorganic  systems  of  which  halide  and  metallic  glasses  are  the  most  notable  with 
those  researchers  specializing  in  the  latter  starting  their  regular  meetings  in  the  sixties. 

Halide  glasses  have  the  potential  for  application  as  ultra-low  loss  optical  fibers  operating  in  the  mid-IR  and 
nonlinear  optical  ranges,  while  "metglasses"  and  nowadays,  nanocrystalline  "finemetals"  have  already  found 
their  place  in  various  magnetics.  It  is  worth  noting  as  well,  the  preparation  of  glass-like  carbon  achieved  by 
solid  carbonization  of  thermosetting  resins  in  the  1960s  (once  used  for  inert  bio-implants)  which  appeared  in 
the  same  year  as  the  first  Au-Si  alloy  glass.  However,  there  was  almost  parallel  development  of  the  individual 
description  of  vitrification  and  crystallization  based  on  the  theories  of  nucleation  and  crystal  growth  which  was 
also  true  for  another  separate  group,  the  organic  and  polymeric  glasses.  Nowadays  the  theories  have  been 
unified. 


GLASS  TRANSITION  AND  PREVIOUS  THOUGHTS 

Terminology  consensus  has  not  been  agreed  to  on  the  meaning  of  glassy  and  often  synonymous  amorphous 
states  [5]  (of  solids),  the  latter  frequently  employed  by  physicists  to  describe  highly,  non-equilibrated  structures 
of  quenched  metals  and  semiconductors  while  the  first  is  preferred  by  chemists  in  the  traditional  field  of 
silicates  and  related  oxide  and  other  anionic  melts.  Amorphous  solids  can  be  prepared  by  any  generalized 
process  of  chemical  and/or  physical  disordering  to  exhibit  an  overlap  of  premature  crystallization  with  glass 
transformation.  This  is  somewhat  different  from  a  "more  thermodynamically-stable  state"  of  glasses  attainable 
by  repeated  vitrification  through  duplicated  (often  rapid)  liquid  cooling  which  should  always  be  accompanied 
by  a  more  or  less  discriminative  region  of  glass  transformation,  which  is  assumed  to  be  a  general  characteristic 
of  the  glassy  state. 

The  physical  appearance  of  a  glassy  and/or  amorphous  solid  looks,  however,  more  stable  than  that  of  an 
udercooled  liquid,  the  latter  being  more  easily  convertible  to  the  nearest  stable  state  of  a  crystalline  solid. 
Consequently  we  also  come  to  the  discussion  of  the  term  "solid"  in  view  of  vitroids  within  the  framework  of 
rheology.  This  would  be  more  appropriate  for  glass  since  such  a  vitroid  changes  with  time  and  the  observation 
time  is  involved  in  detecting  the  extent  of  change.  Three  types  of  glasses  can  be  distinguished  by  the  previous 
idea  to  be  ranked  as  glassy  liquids,  glassy  (molecular)  crystals  and  glassy  liquid  crystals  characterized  by  their 
own  transformation  regions  related  to  dissipation  of  a  certain  "freedom".  In  this  view,  amorphous  solids 
actually  belong  to  a  fourth  limiting  case  depending  on  execution  of  the  disordering  process  itself.  In  this  light 
we  can  also  consider  formation  of  low-dimensional  structures  and  sol-gels,  the  latter  being  particularly, 
subjected  to  further  classification  due  to  the  inability  to  clearly  involve  the  glass  transformation  phenomenon. 
Correlation  between  the  characteristic  temperatures  and  glass  forming  ability  has  been  anticipated  in  the 
form  of  reduced  quantities.  The  ratio  between  Tg  and  T  (=  Tgr)  melt  should  be  roughly  2/3.  A  similar 
behaviour  relates  to  the  ratio  To/Tme|t  (=  Tor ). 

Let  us  repeat  some  traditional  energy  considerations  [5,6].  At  the  melting  point,  the  liquid  and  solid  have 
equal  Gibbs  energies  but  differ  in  the  enthalpy  and  entropy  contents.  Upon  cooling  below  the  melting  point 
the  entropy  of  the  undercooled  liquid  decreases  more  rapidly  than  that  of  a  stable  solid  .  Examining  these 
different  rates  of  entropy  loss,  we  can  determine  a  point  where  the  entire  entropy  of  melting  would  be 
diminished,  resulting  in  the  entropy  of  both  phases  becoming  identical  at  a  temperature  called  the 
Kauzmann  “pseudo-critical”  temperature,  still  lying  above  absolute  zero.  This  means  that  the  liquid  loses 
its  entropy  at  a  faster  rate  than  the  solid  and  if  the  liquid  maintains  configurational  equilibrium  on  very  slow 
cooling  to  the  region  where  it  attains  high  viscosity,  it  would  have  a  lower  entropy  than  the  solid.  Such  a 


406 


state,  however,  is  unattainable  and  the  equilibrium-like  liquid  must  therefore  transform  into  rigid  glass  at  a 
pseudo-second  order  transformation.  Such  a  critical  trend  of  entropy  is  not  always  sufficiently  understood 
since  such  a  prior  intersection  by  liquid  vitrification  causes  the  heat  capacity  of  the  liquid  to  change 
abruptly  to  a  value  close  to  that  of  a  corresponding  solid.  However,  an  unsolved  question  remains  as  to  what 
would  happened  if  such  an  isoentropy-temperature  of  the  so-called  ideal  glass  transformation  is  nevertheless 
attained  by  hypothetical  infinitesimal-slowing  of  the  cooling  rate,  thereby  avoiding  the  irreversible  freeze- 
in.  Although  this  is  an  imagination  game,  there  would  appear  to  be  a  kind  of  higher  order  transition  wherein 
the  heat  capacity  of  the  undercooled  liquid  changes  to  that  of  the  congruous  stable  crystalline  solid  and 
could  be  regarded  as  a  "fourth  state  of  matter". 

The  viscosity  of  liquid  can  be  regarded  as  a  reflection  of  the  relation  between  the  thermal  energy  available 
at  a  given  temperature  and  the  strength  of  the  forces  pulling  the  species  together  and  restructuring  their 
position  in  a  given  volume  within  which  molecular  rearrangement  can  occur.  The  possible  rate  of  these 
rearrangements  rapidly  decreases  with  decreasing  volume  within  which  the  species  are  packed.  The  volume 
is  determined  by  the  strength  of  the  attractive  forces  and  is  reflected  in  the  characteristic  temperatures 
(melting,  critical  points,  etc.).  The  more  strongly  the  components  interact,  the  more  rapidly  the  freezing 
point  of  the  solvent  is  depressed  and  the  viscosity  is  increased,  consequently  slowing  perturbing  nucleation, 
but  this  should  not  be  so  strong  as  to  generate  a  new  competing  crystalline  phase.  Zachariesen  rules  can  thus 
be  understood  to  predict  low  melting  points  relative  to  the  forces  acting  between  the  species,  although  some 
newly  developed  glasses  violate  these  predictions. 


ANHARMONICITY  VIBRATIONAL  APPROACH  TOWARDS  THE  CREATION 
OF  FREE  VOLUME  IN  THE  VICINITY  OF  GLASS  TRANSITION 

Entropy  Considerations 

As  shown  above,  the  problem  of  explaining  Tg  (glass  transition)  for  a  long  time  was  presumed  to  be  mainly 
related  to  the  configurational  or  conformational  changes  of  entropy.  Therefore  we  would  like  to  concentrate 
our  effort  on  the  clarification  of  Tg  transition  through  anharmonic  vibrations  [7,8]  and  drastic  amplitude 
changes  in  the  vicinity  of  Tg  .  Such  an  isolated  non-linear  oscillator  can  reveal  double  frequencies  and 
pulses.  If  the  individual  oscillators  can  interact  on  similar  frequencies,  then  the  individual  particles  can 
undergo  a  discontinuity  in  amplitudes  (so-called  amplitude  jump).  In  the  liquid  state,  such  an  amplitude 
jump  of  a  monomer  or  dimer  unit  would  push  aside  particles  in  the  vicinity  forming  an  opening  for  a 
vacancy  space.  Enlargement  of  the  amplitude  can  be  detected  by  means  of  methods  of  neutron-scattering 
which  can  provide  the  information  about  average  vibrational  amplitude  (the  Debye-Waller  factor). 
Occasional  misapprehension  passed  on  in  early  work  with  Tg  transitions  had  its  source  in  the  disregard  of 
two  facts:  (1)  omission  of  a  proper  definition  of  “a”  known  in  studies  in  solid  state  physics  and  (2)  over¬ 
emphasis  of  the  meaning  of  the  configurational  and/or  conformational  part  of  entropy  in  the  following 
equation  as  found  in  the  vast  majority  of  cases: 

S  =kln  Wconf+kln  W,herm  1. 

where  k  stands  for  the  Boltzmann  constant.  In  our  theory,  we  suppose  that  the  Tg  transition  is  connected  to 
the  release  of  vibrational  motion  of  monomer  or  dimer  units  in  the  rotational  sense  as,  e.g.,  the  spinning  of  a 
benzene  ring  around  vinyl  groups  in  a  polystyrene  chain.  Above  the  Tg  region,  the  vacancy  thus  created  can 
occupy  a  volume  larger  than  10  L3  .  For  inorganic  glasses,  the  substructure  of  the  Si04  tetrahedron  is 
reputed  to  be  released  in  the  vicinity  ofTg .  Under  the  temperature  (Tg  =  52  K),  only  a  very  small  part  of  the 
particles  can  undergo  a  finite  displacement.  About  10'5  tunnelling  states  per  atom  and  about  103  states 
connected  to  boson  peak  at  very  low  temperatures  can  be  presumed  if  we  disregard  the  motion  of  side 
chains.  Below  Tg ,  Equation  1  becomes  : 

S  =klnWtherm  2. 

as  a  consequence  of  Wconf  =  1 .  The  Kauzman  paradox  of  negative  entropy  (the  so-called  entropy  crisis)  can 
thus  never  occur,  because  one  part  of  entropy  which  should  participate  under  Tg  just  disappears.  For 


407 


polymers  the  value  of  cp  (heat  capacity)  per  atom  is  approximately  "k"  to  "2k"  forming  a  sort  of  analogy  to 
the  Dulong-Petit  rule  for  metals.  The  potential  valley,  in  which  the  individual  particle  is  supposed  to 
undergo  vibrational  motion,  can  be  written  as  follows: 

U-U.  =  — kT  =  — f2£  ,  where  kT/f  =^2  3. 

2  2 

In  Equation  3,  U0  is  the  reference  energy  level  which  can  be  taken  as  equal  to  zero  and  %  =  r  -  r0  is  the 
deviation  from  the  bottom  of  potential  valleys  while  k  is  the  Boltzman  constant  and  f  is  related  to  the  bulk 
compressibility  modulus  K  =  f/r0 . 


For  the  non-linear  form  of  a  potential  valley,  the  relationship  becomes: 

U-U, 


1  2  1  e  3 


The  non-zero  coefficient  of  thermal  expansion  can  be  defined  as: 

'  f2 


‘0 


fo  dT 


4. 


5. 


It  is  evident  that  this  definition  does  not  need  to  be  perfect.  However,  as  it  has  already  been  shown  that  the 
inclusion  of  a  higher  term  in  the  power  series  development,  analogous  to  Equation  4,  would  not  bring  any 
difference  into  the  definition  of,  a,  or 


u-u0=^: 


Ig£3--r 

4S 


This  type  of  potential  valley  is  generally  considered  in  the  basic  physics  of  inorganic  glasses.  Usually  the 
authors  do  not  consider  possible  interactions  of  particles  with  particles  located  in  the  neighbourhood.  Such 

interactions  with  a  nearby  resident  can  bring  the  isolated  particle  to  a  completely  different  level  ofE,  for 
which  the  particle,  if  left  subsequently  isolated,  must  associate  with  a  different  anharmonicity  level 
characterised  through  a  completely  different  ratio  of  g/f 2  functions.  This  stems  from  Equations  3  and  5  as 
well  as  from  the  expression  for  the  average  force  acting  on  the  isolated  particle: 

(FAV=f|-  8^  =  0  and  C  =  |fT)  7. 

The  non-linearity  can  also  accounted  for  through  the  variation  coefficients  of  the  second  order  differential 
equation  together  with  the  addition  of  the  right-hand  side  to  yield  Equation  8.  A  variety  of  particle 
interactions  can  be  considered.  For  particle  motion  in  the  potential  valley,  we  get: 


m- 


dt2 


•  +  F 


dt  J  dt 


*;■ 


dt 


cos  pt 


8. 


where  the  angular  frequency  ,  "p"  is  presumed  to  be  close  to  the  characteristic  'bigen"  frequency  "to"  of  free 
vibrations.  The  right  hand  side  of  the  equation  stands  for  particle  interactions  with  its  neighbours.  By  using 
mathematical  methods,  the  non-linear  system  of  the  second  order  differential  equation  can  be  turned  into 
two  separate  first  order  differential  equations  with  variables  Aj  and  A2  (corresponding  for  example  to  A  = 
d^/dt  and  A2  =£,  =  f(A,;t;a;...etc.) ;  and  subsequently  even  the  time  dependence  can  be  eliminated.  We 
can  get: 

dA 


dA 

dt 


L=anA1  +oti2A2 


dt 


■  — +a2,A2 


9. 


and  choosing,  e.g.,  -  =  -an  +a22  we  can  arrive  at  the  time-independent  amplitude  representation  of 

the  problem  [8]  where  the  type  of  motion  is  defined  through  different  ratios  of  constants  at  and  a2.  The 
neutron  scattering  data  [9]  present  the  most  convincing  evidence  of  the  average  amplitude  rise  in  Tg 
vicinity.  The  average  amplitude  of  vibrations  starts  to  rise  slowly  at  Vogel's  temperature  and  at  T  =  Xr  (the 
so-called  crossover  temperature  Tcr  ~  1 ,2Tg ),  the  constant  slope  of  the  average  amplitude  rise  is 

established  (for  area  of  T  >  Tcr ).  We  assume  that  vacancies  are  created  in  the  liquid  matrix  through  the 


408 


high  amplitude  motion  of  the  particles.  In  such  a  case  a  vibrating  particle  is  able  to  push  aside  neighbouring 
particles  as  experimentally  evidenced  with  cis- 1 ,4-poly(butadiene)  [10].  The  basic  conclusion  of  the  theory 
is  that  the  amplitude  change  would  play  a  governing  role  in  the  definition  of  a  liquid  state  and  in  its 
transition  into  the  solid  or  the  glassy  state.  The  rise  in  vibrational  amplitude  is  the  major  reason  for  solid 
and/or  liquid  volume  enlargement.  These  expansions  proceed  either  through  continuous  changes  or  through 
a  sharp  discontinuity.  In  such  a  way  the  non-linear,  mutually  interactive,  oscillator  system  can  successfully 
cope  with  the  first  order  as  well  as  with  the  second  order  transitions  which  take  place  in  the  liquid  state. 


THE  SCIENCE  AND  HORIZON  OF  NONCRYSTALLINE  STATES 

Yet  more  study  [6,11]  must  be  directed  to  ascertain  near-glass-transformations  and  pseudo-glass- 
transformations  in  order  to  study  intermediate  states  between  amorphous  and  glassy  solids  in  the  sub-glass 
transformation  region.  Among  these  should  be  order/disorder  changes  of  deposited  tetrahedral  and  amorphous 
carbon  or  a  pronounced  short  and\or  medium  ordering  in  the  as-quenched  and  amorphised  alloys.  Progressive 
study  of  vibrational  states  of  silicon  in  the  crystalline  and  amorphous  forms  as  well  as  the  associated  void 
formation  seems  to  be  of  no-less  importance  for  a  better  understanding  of  the  higher  density  of  amorphous 
forms.  Inelastic  neutron  scattering  will  assist  in  observing  the  nature  of  hydrogenated  amorphous  silicon  when 
investigating,  e.g.,  the  bond  type  (so  far  single,  double,  but  not  triple)  of  hydrogenated  and  deuterated  samples 
as  well  as  the  effect  of  hydroxylation.  An  attempt  at  forming  a  nanocrystalline  theory  of  photoluminescence  is 
also  foreseen  to  guide  technologists  in  preparing  a  comparable  material  by  controlled  nucleation  of  laser  glazed 
surfaces  or  even  sol-gel  precursor  samples.  There  is  a  certain  hope  of  tailoring  the  multilayer  silicon-silica 
sandwiches  instead  of  using  conventional  electrolysis  of  high  voltage  discharging. 

Interesting  attention  is  still  to  be  expected  in  the  study  of  intermediate  states  between  glass,  liquid  and  crystal, 
i.e.,  the  architecture  of  ordered  crystallites  at  a  submicron  scale  relevant  to  the  medium-range  order  that  exists 
in  the  glassy  state  and  which  become  prenucleation  stages  in  generalized  precursor  liquids.  This  will  be 
continued  by  studying  the  anomalous  small/wide  angle  x-ray  diffraction  (ASAXS  and  AWAXS)  using  an 
NMR  insight  into  the  medium-range  ordering  and  microscopic  mechanism  of  diffusion  and  viscous  flow  in 
precursor  melts  for  a  better  understanding  of  bonds  between  cations  and  anions,  the  spectroscopy  of 
substructures  using,  e.g.,  the  still  traditional  concept  of  bridging,  half-  and  non-bridging  oxygen  (regarding  e.g. 
biocompatibility)  or  interaction  of  metallic  and  metalloid  species  in  metglasses  to  possibly  explain  the  role  of 
thermal  history,  recreated  medium-range  order  and  the  effects  of  modifying  admixtures.  The  successful 
assistance  of  theoretical  treatises  based  on  classical  molecular  dynamics  and  dynamic  stimulation  of  electronic 
ground  states  and  of  topological  restructuring  (low  temperature  annealing  process)  remains  inevitable  (ib-initio 
molecular  dynamic  techniques,  reverse  Monte  Carlo,  etc.). 

For  a  progressive  tailoring  of  magnetic  as  well  as  of  mechanical,  ferroelectric  and  dielectric  properties, 
attention  is  paid  again  to  the  medium  ordering  states  because,  e.g.,  the  extent  of  magnetic  exchange  interactions 
is  effective  across  a  given  width  of  magnetic  domain  walls,  and  the  disordered  nanocrystallites  of  a  subcritical 
domain  size  ('finemetals')  would  thus  appear  as  magnetically  disordered  in  a  similar  way  as  trulynoncrystalline, 
yet  classical,  'metglasses'.  Similarly,  this  may  bring  new  dimensions  to  nonlinear  optoelectronics  where  again 
noncrystallite  waveguides  can  eventually  play  an  important  role  in  infrared  optics.  Silica  glass  fibers  could 
cause  the  frequency  doubling  of  infrared  laser  beams  suggesting  that  even  a  noncrystalline  solid  can  have  large 
second-order  susceptibilities.  Oxide  glasses  also  serve  as  useful  transparent  matrices  for  semiconductor 
microcrystallites  to  form  nanocomposites  with  large  third-order  susceptibilities.  Controlled  uniform  size 
distribution  of  quantum  dots  is  needed  for  such  nonlinear  devices  and  soliton  switching  as  well  as  waveguide 
lasers  while  nonuniformity  is  required  in  applications  of  optical  data  storage.  Submicron  crystallites  of  halides 
in  composite  glassy  electrolytes  essentially  increase  ionic  conductivity,  and  nanometric  pinning  centers 
improve  superconductivity  of  complex  cuprates.  Nanocrystallization  of  porous  silicon  plays  an  important  role 
in  better  managing  of  photoluminiscence  when  taking  into  account  the  properties  of  silica,  as  the  separating 
interfaces  of  silicon  grains  were  recently  shown  to  be  responsible  for  blue  photoluminescence,  their  quality 
dependent  on  the  nano-sized  separating  layers  which  should  remain,  according  to  early  studies  carried  out  on 
inorganic  and  organic  silanes. 


409 


To  speak  of  another  important  area,  that  of  superalloys,  we  can  briefly  cite  the  importance  of  inhibition  of  any 
subcritical  nuclei  formation  in  such  diverse  fields  as  biology,  to  mention  cryo-preservation  of  viruses  and/or 
growth  of  faults  (diseases,  e.g.,  cancer)  in  preventative  medicine.  Self-protection  of  plants  against  freezing  is 
another  example  taking  place  by  the  process  of  drying  (fluid  concentrational  changes).  Oxide  gels  and 
organically-modified  silica  gels  (ormosils)  should  not  be  forgotten  as  well-known  hosts  for  nanociystallites 
which,  in  combination  with  optically-active  polymers,  can  provide  high  third-order  optical  materials. 

Order/disorder  phenomena  in  systems  with  lower  dimensions  are  separate  emerging  fields  providing  new 
boundary  problems  such  as  nanometer  range  phase  separation  in  thin  amorphous  films  prepared  by  CVD  as 
known  for  germanium.  It  touches  as  remote  a  material  area  as  non-stoichiometric  semiconductors  prepared  via 
nonequilibrium  MBE  or  MOCVD,  e.g.,  semiisolating  GaAs  where  Ga,  substituted  in  As-regular  sites,  produces 
As-vacancies  acting  as  deep-level  electron  traps.  These  matrices  are  generally  understood  as  submerged 
disordered  systems  of  defects  with  nanocrystalline  dimensions.  When  one  characteristic  dimension  is  of  the 
order  of  the  electron  wave-length,  quantum  electron  phenomena  (i.e.  dimensional  absence  of  electron 
resistance)  become  important.  These  are  known  as  quantum  wells,  wires  and/or  dots.  If  for  an  appropriate 
thickness  of  a  semiconductor  layer,  disorder  of  the  interface  is  controlled  by  remote  doping,  a  high  mobility 
transistor  (HEMT)  function  is  achieved  on  the  basis  of  the  quantum  well.  A  comparable  but  almost  zero 
dimensional  fluctuation  can  be  created  across  the  dividing  insulating  layer  by  formation  of  quantum  dot  arrays 
prepared  either  by  semiconductor  layer  etching  or  by  random  chemisorption  (chemical  FET).  Quantum  dots 
can  also  be  conventionally  formed  by  dispersion  in  a  suitable  matrix,  their  optimum  size  is  estimated  from  the 
ratio  of  material  permittivity  to  effective  mass.  Such  a  field,  apparently  remote  from  the  traditional  glass  field, 
may  become  a  boundary  area  for  theoreticians  when  assuming  during  slow  cooling  of  a  single  crystal,  that  non¬ 
equilibrium  and  relatively  large-scale  fluctuations  are  created.  Thus  even  highly  ordered  structures  with  a  low 
dopant  concentration  show  positioning  comparable  to  nanometric,  medium  ordered,  modulated  structures. 

The  functional  utility  of  such  newly  ranked  materials  need  not  be  appreciated  right  away  before  their  properties 
are  adequately  characterized.  Already  common  rapid  solidification  has  played  a  key  role  in  the  discovery  of 
quasictystals,  a  class  of  materials  neither  exactly  crystalline  nor  noncrystalline,  and  in  the  ongoing 
reexamination  of  the  basic  principles  of  crystallography.  Closely  related  stereochemical  models,  where  so- 
called  order  within  disorder  is  a  reliable  approach  considering  modulus  as  a  probable  measure  of  structural 
order,  question  the  classical  models  of  crystallography  versus  noncrystallography.  Breakthroughs,  such  as  the 
recent  discovery  of  the  Hall-quantum  effect  or  high-temperature-superconductivity  cannot  be  predicted  or 
planned  for  the  next  millennium.  The  case  is  similar  for  any  possible  prospect  of  the  above  discussed  branches 
of  material  science  related  to  the  discussed  glasses  and  noncrystalline  and  low-dimensional  structure  in  general. 

ACKNOWLEDGEMENT 

This  study  was  carried  out  under  the  financial  support  of  the  Grant  Agency  of  Czech  Republic  106/97/0589 


REFERENCES 

1 .  Barrington-Haynes,  E.,  1948.  Glass  through  the  Ages,  Penguin,  Harmondsworth. 

2.  Phillips,  C.J.,  1948.  Glass  the  Miracle  Maker,  Pitman  &  Sons,  London. 

3. Bouska,  V.,  1993.  Natural  Glasses,  Academia,  Praha. 

4. Zarzycki,  J.  (ed.),  1991.  Material  Science  and  Technology:  Glasses  and  Amorphous  Materials, VCH- 
Weinheim 

5.Sestak,  J.,  1985,  Thermochim.  Acta  95,  459. 

6.Sestak,  J.,  1997,  Glastech.  Ber.  Glass.  Sci.  Tech.  70C,  153. 

7. Hlavacek,  B.,  Kresalek,  V.,  Soucek,  J.,  1997,  J.  Chem.  Phys.,  107,  4658  &  1996,  Thermochimica  Acta, 
280/281,417. 

8. Hlavacek,  B.,  Sestak,  J.,  1999.  Theory  of  anharmonicity  vibrational  approach  towards  the  creation  of 
free  volume  in  the  vicinity  of  glass  transition,  Keynote  Lecture  at  Glass99,  Prague,  (in  press). 

9. Buchenau,  U.,  Zorn,  M.,  1992,  Europhysics.  Letters.  18,  523. 

10. Bartos,  J.,  at  al.,  1997,  Macromolecules  30,  6912  &  Physica  B  234/236,  435. 

1 1 .  Sestak,  J.,  in  the  book  by  Chvoj,  Z.,  Sestak,  J.,  Triska,  A.,  (eds.)  1991.  Kinetic  Phase  Diagrams;  Non¬ 
equilibrium  phase  transformations,  Elsevier,  Amsterdam. 


410 


411 


Oxygen  Solubility  Modeling  in  Aqueous  Solutions 

Desmond  Tromans 

Department  of  Metals  and  Materials  Engineering,  and  Pulp  and  Paper  Centre, 
University  of  British  Columbia,  Vancouver,  BC  V6T  1Z4,  Canada. 


ABSTRACT 

Oxygen  dissolved  in  the  aqueous  phase  (02)aq  is  an  important  oxidant  in  many  industrial  processes,  ranging 
from  pressure  leaching  and  heap  leaching  of  metals  from  minerals  to  the  bleaching  of  wood  fibers  in  the 
pulp  and  paper  industry.  Frequently,  (O^aq  is  a  prime  agent  promoting  corrosion  of  metals  in  aqueous 
systems.  This  study  presents  a  general  solubility  model  for  estimating  oxygen  solubility  in  aqueous 
inorganic  solutions  over  a  wide  range  of  conditions.  These  include  changes  in  oxygen  partial  pressure  P0 

(atm),  variations  in  the  process  temperature  T  (K),  and  changing  concentrations  Cj  of  dissociated  inorganic 
solute  I.  The  model  is  based  on  a  thermodynamic  analysis  showing  that  the  concentration  caq  of  (02)aq  in 
pure  water  is  dependent  upon  PQ.  and  T  via  an  equation  of  the  form  caq  =  P0i  f  (T) ,  where  J[T)  is  a  T- 

dependent  function  related  to  the  chemical  potential,  entropy,  and  partial  molar  heat  capacity  of  the  gaseous 
oxygen  (02)g  and  dissolved  (02)aq  species.  In  the  presence  of  a  single  /,  this  equation  is  modified  by  a  re¬ 
factor  such  that  the  new  oxygen  solubility,  (caq)h  becomes  (caq  )7  =  <| )caq  =  <t>/b,  / (T)  ,  where  <|>  is  an  /- 

dependent  function  of  Q.  Inorganic  solutes  of  similar  stoichiometry,  composed  of  a  common  anion  and 
having  cations  from  the  same  Group  in  the  Periodic  Table,  tend  to  exhibit  a  similar  <j)-factor  and  ( caq ), 
value,  provided  all  concentrations,  caq,  ( caq)i ,  and  C/,  are  reported  in  molal  (m)  units  (mol/kg  H20).  Methods 
for  incorporating  the  effect  of  multiple  /  on  <()  are  presented  and  discussed. 

INTRODUCTION 

Many  industrial  oxidation  processes  rely  upon  the  presence  of  dissolved  oxygen  (0>)aq,  to  accomplish 
oxidation,  including  leaching  of  minerals  from  ores  and  oxygen  bleaching  of  pulp.  In  other  situations  (Q)aq 
may  have  undesirable  effects,  such  as  corrosion  of  metals.  Frequently,  these  processes  fall  under  mass 
transport  control  of  (02)aq  to  the  reacting  surface.  Consequently,  higher  oxygen  solubility  enhances  mass 
transport  and  increases  oxidation  rates.  Thus,  a  quantitative  and  predictive  knowledge  of  oxygen  solubility 
is  desirable,  particularly  as  it  is  affected  by  such  process  variables  as  temperature  T  (K),  partial  pressure 
Pq2  of  oxygen  in  the  gas  phase  (Q2)g  and  the  concentration  C/  of  inorganic  solutes  I. 

Measurements  of  oxygen  solubility  in  water  and  aqueous  solutions  have  been  made  for  many  decades  [1], 
Until  recently,  little  progress  was  made  towards  the  development  of  a  unifying  and  predictive  equation 
(model)  that  combined  the  conjoint  effects  of  T,  PQ_  ,  and  C,.  This  was  due  partly  to  the  variety  of  different 

oxygen  and  solute  solubility  units  used  that  were  used  and  to  considerable  empiricism  in  reported 
relationships.  Recent  analytical  modeling  studies  by  the  author  [2]  have  shown  that  a  rigorous 
thermodynamics-based  approach  leads  to  a  unifying  equation  for  the  concentration  caq  of  (02)aq  in  pure 
water.  Good  agreement  between  the  unifying  equation  and  published  data  were  evident  when  such  data 
were  converted  to  the  same  set  of  thermodynamic  units  as  those  used  in  the  model.  This  model  was  then 
modified  to  incorporate  the  effects  of  different  /  and  shown  to  have  the  potential  for  predicting  (estimating) 
oxygen  solubility  (caq)j  in  /-containing  solutions  of  industrial  relevance  [3],  Predictive  capabilities  are 
necessary  for  the  utility  of  any  model  for  process  development,  in  order  to  avoid  the  prohibitive  time  and 
costs  involved  in  the  measurement  of  (02)aq  solubility  for  different  combinations  of  T,  P0i  and  C/. 

The  oxygen  solubility  model  for  pure  water  will  be  outlined  first,  followed  by  a  modification  to  account  for 
the  effects  of  I.  Subsequently,  examples  of  the  predictive  capability  of  the  model  will  be  presented  and 
discussed.  All  thermodynamic  units  are  consistent  with  those  recommended  by  the  International  Union  of 
Pure  and  Applied  Chemistry  (IUPAC),  e.g.  molal  (m)  units  (mol/kg  H20)  for  the  concentrations  caq,  (cag)j, 
and  Cf,  degrees  Kelvin  (K)  for  T;  and  atmospheres  (atm),  for  P0 ,  where  1  atm  =  101.325  kPa.  Methods 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


412 


for  converting  concentrations  of  (Q2)aq  and  1  from  other  units  to  m  are  described  previously  [2,3],  including 
use  of  the  International  Critical  Tables  [4]  to  convert  molar  (M)  units  (mol/liter  of  solution)  to  m. 


OXYGEN  IN  PURE  WATER 

Equilibrium  between  (02)g  and  (02)aq  is  given  by  Eq.  1 , 

(02)g  =  (02)aq,  ^  =  [02]aq  /[C>2]g  =  [aca9MY^O;,  1  1- 

where  k  is  the  equilibrium  constant,  square  brackets  [  ]  denote  activity,  a  is  the  activity  coefficient  of 
(02)aq,  and  y  is  the  fugacity  coefficient  of  (O2V 

The  value  of  k  at  any  T  is  related  to  the  standard  molar  chemical  potentials  paq  and  p”  of  the  (02)aq  and 

(02)g  species,  respectively,  at  that  temperature  and  the  overall  change  in  chemical  free  energy  of  the 
reaction  AG°  via  Eq.  2  leading  to  Eq.  3, 

AG°  =  p°q  -  p°  =  -R7’  In  k  2. 

k  =  exp(-AG°  /R7’)  =  exp{(p“  -p^/RT}  3. 

where  R  is  the  gas  constant  (8.3 144  J.mof'.K). 

The  7’-dependence  of  k  (and  caq)  is  controlled  by  the  effect  of  T  on  the  exponential  function  in  Eq.3.  The 
value  of  p°  for  a  single  species  at  T2  is  related  to  that  at  a  reference  7)  by  Eq.  4[2], 

(p°)r2  =(p°)r|  +  GpdT  -  7,  g  -  S°  (T2-Tt  )  4. 


where  CP  is  the  molar  heat  capacity  of  the  species  at  constant  pressure  and  S£  is  its  standard  entropy  at  7). 

Hence  from  known  p°  and  S°  data  at  T\  the  new  chemical  potentials  of  each  species  may  be  calculated  at  T2 
to  give  the  new  equilibrium  constant  kTi  at  T2, 


kT  =exp 


'(r±  ' 

r  r. 


=  exp 


Kh  -K)r, 


r  r. 


5. 


The  most  common  reference  7)  for  which  considerable  thermodynamic  data  have  been  determined  is  298K 
(25°C).  Standard  p°  and  S°  values  for  the  gaseous  and  dissolved  oxygen  species  at  298  K  are  listed  in  Table 
1,  together  with  the  data  sources  [2.5], 


Species 

p°  (kJ.mof1) 

S°  (J.mof'.K'1) 

Reference 

(02)s 

0 

205.028 

Hoare  [5] 

(Ol)aa 

+16.506 

+109 

Tromans  [2] 

Based  on  the  reported  T-dependent  variation  in  the  molar  heat  capacity  (C/>)g  of  (02)g  [6,7],  it  was  found 
that  (Cp) g  could  be  represented  by  a  linear  function  of  T  in  the  range  273-650  K  [2], 

(CP)g  =+26.65  +  (9x10“3)7\  J.moT’.K'1  6. 

Also,  analysis  of  reported  oxygen  solubility  behavior  by  the  author  [2]  showed  that  the  molar  heat  capacity 
(Cp) aq  of  (02)aq  could  be  represented  by  a  linear  function  of  T 

(CP)  aq  =+230-(8.3xlO-2)r,  J.moT'.K"1  7. 

The  much  higher  value  of  (CP)aq  relative  to  (CP)g  indicates  that  changes  in  molecular  rotations  and  bond 
vibrations  have  been  caused  by  interactions  (disturbances)  between  (Ch)aq  molecules  and  the  surrounding 
water  molecules.  The  decreasing  (CP)3q  with  increasing  T  is  then  seen  to  be  consistent  with  the  decreasing 
interactions  between  water  molecules  (i.e.  diminishing  hydrogen  bonding)  with  increasing  T,  as  revealed  by 
the  decrease  in  activation  energies  of  viscosity  and  self  diffusion  of  water  with  increasing  T  [8]. 

From  Eqs.4  to  7,  together  with  the  data  in  Table  1  and  using  a  reference  temperature  7)  of  298  K,  it  is  a 
straightforward  arithmetical  exercise  to  show  [2]  that  the  value  of  k  at  any  arbitrary  value  of  T  (equivalent 
to  T2)  is  given  by  the  function  of  T,f(T),  shown  in  Eq.  8  [2], 


413 


k  =  f(T)  =  expj 
Combining  Eqs.  1  and  8, 


^0 .046r2  +  203.35T ln(T / 298)  -  (299.378  +  0.092 T)(T  -  298) -20.591x1 03  ^ 


(8.3144)7 


caq{a!y)  =  P0k  =  P0J(T)  9. 

Furthermore,  it  was  shown  previously  [2]  that  for  Pq,  to  ~60  atm  and  T  from  273  to  616  K  that  a/ y  may 

be  closely  approximated  to  unity  so  that  the  final  model  equation  for  oxygen  solubility  becomes, 

caq=Po2k  =  PoJ(T)  or  cog/P02=k  =  f(T )  10. 

where  /(73  is  given  by  Eq.  8. 


The  predictions  of  Eq.  10  compare  very  favorably  with  published  solubility  data  [9-14]  in  Fig.l,  after 
converting  all  data  to  molal  (m)  concentrations).  The  data  include  P0  to  ~60  atm  and  Tto  616  K 


0.008 


. - ; -  Hayd  uk  / 

_l - ; - -  Pray  e  fa/.  j_ 

• - ■ - ■  Broden  and  Simonson  / 

—  0.006 

• - - -  Stephan  et  al.  ! 

E 

? - 4 - *  Battino/  Benson  ef  al.  • 

TO 

-  Equation  10  / 

I 

/V 

0.004 

/ 

OP 

*  / 

CT 

/ 

0.002 


Ql - i - . - ■ - . - 1 - * - * - 1 

300  400  500  600 

T(  K) 

Fig.  1.  Comparison  of  model  Equation  10  with  published  oxygen  solubility  data  in  water. 


SOLUTIONS  CONTAINING  SINGLE  INORGANIC  SOLUTES  (DISSOCIATED) 

The  effect  of  dissociated  (ionized)  inorganic  solutes,  /,  on  oxygen  solubility  is  modeled  by  assuming  that 
only  a  fraction  <[>  of  the  water  is  available  for  the  dissolution  of  (O^,  the  remaining  fraction  (l-<j))  being 
unavailable  due  to  near  and  distant  interactions  between  the  solute  ions  (cations  and  anions)  and  water 
molecules  [3],  This  concept  arose  from  similarities  in  the  difference  between  CP  values  of  the  (02)aq  and 
(02  )g  species  and  the  difference  between  the  CP  values  of  metals  and  their  corresponding  cation  [2],  Thus 
the  oxygen  solubility  (caq),  in  the  presence  of  a  single  /  is  given  by  Eq.  1 1 , 

iPaq)  I  =  4*0,  =§Po2k  1L 

where  k  and  caq  have  the  same  values  as  those  in  Eqs.  8  and  10,  respectively. 

For  modeling  purposes,  it  is  sufficient  to  treat  the  dependence  of  <j)  on  Cj  in  terms  of  an  empirical  function, 
<[)  =y(Q),  where  cf>  — >  1  as  C’7  — 0  and  4>  — >  0  as  C,  »1 .  A  suitable  function  requiring  positive  values  for 
the  coefficient  k  and  exponents  y  and  ft  is  given  by  Eq.  12,  leading  to  Eq  1 3  after  combining  with  Eq.  11. 

4>  =  {l  +  K(C/r)“n  12. 

(caq)I=P02k[\  +  K(Ciyy"]  13. 

Values  of  K,  y  and  ft  have  been  calculated  for  26  I  at  298  K  and  1  atm  P0_  [3,15,16],  using  published 
oxygen  solubility  data.  They  are  listed  in  Table  2. 


The  shape  of  the  Cj  -  <|)  curves  based  on  the  data  in  Table  2  are  shown  in  Figures  2(a)  to  2(c).  Curves  for, 
HC1  and  the  halide  salts  are  included  in  2(a),  H2S04  and  sulfates  in  2(b),  and  alkaline  hydroxides  and  other 
salts  in  2(c).  The  behavior  of  aqueous  ammonia  (NH3)aq,  shown  in  Fig.  2(c),  will  be  discussed  later. 


414 


Considering  a  common  anion,  Fig.  2  suggests  the  effect  of  a  metal  salt  on  <|>  is  related  to  the  position  of  the 
metal  in  the  Periodic  Table.  For  chlorides,  Fig.  2(a),  <j)  decreases  in  the  order  Group  1A  metals  (Na, 
K)>Group  IIA  (Mg,  Ca,  Ba)>Group  IIIA  (Al).  For  sulfates,  Fig.  2(b),  the  trend  is  transition  metals  (Co,  Ni, 
Cu)>Group  IA  (Na,  K)>  Group  IIA  (Mg)  and  IIB  (Zn)>Group  IIIA  (Al).  Similarly,  for  nitrates  in  Fig.  2(c), 
Group  IA  (Na)>  Group  IIA  (Ca).  Additionally,  Figure  2  suggests  that  1:1  and  2:1  salts  (cation:anion  ratio) 
having  a  common  anion  with  cations  from  the  same  Periodic  Group  tend  to  exhibit  similar  (^-fractions. 
Thus,  (^-fractions  of  salts  for  which  there  are  no  data  may  be  estimated  from  similar  salts  in  Fig.  2  and 
inserted  in  Eq.13. 


Table  2.  Values  of  k,  y  and  r|  for  dissociated  (ionized)  solutes,  Eq.  12,  at  298  K  and  latm  P0  . 


Solute  I 

K 

y 

Solute  I 

K 

y 

r\ 

HC1 

0.305514 

1.092174 

0.232093 

mm 

2.23207 

1.115617 

0.222794 

EUSM 

2.01628 

1.253475 

0.168954 

2.23207 

1.115617 

0.222794 

NaOH 

0.102078 

1.00044 

4.308933 

EMsHH 

0.179714 

0.984502 

2.71142 

KOH 

0.102078 

1.00044 

4.308933 

CaCl2 

0.179714 

0.984502 

2.71142 

NaBr 

0.034541 

0.925947 

7.095218 

ES3HH 

0.179714 

0.984502 

2.71142 

KBr 

0.034541 

0.925947 

7.095218 

inr|  |Mf  |  | 

0.38142 

0.804022 

1.683714 

NaCl 

0.075502 

1.009502 

4.223927 

A1,(S04), 

0.641163 

0.954719 

3.033594 

KC1 

0.407374 

1.116089 

0.842095 

0.314 

1.084 

0.883 

0.629498 

0.911841 

1.440175 

Ca(NO,)2 

0.020554 

0.946932 

21.04 

0.55 

0.911841 

1.440175 

0.34 

1.1 

3.13 

EEMsm 

0.119674 

1.107738 

5.455537 

Na2SO, 

0.332 

1.03 

2.67 

ZnS04 

0.232671 

1.010428 

2.655655 

NH4C1 

0.57 

1.2 

0.278 

113352  I 

2.23207 

1.115617 

0.222794 

0.69 

1.11 

0.749 

Figure  2.  Effect  of  concentration  of  inorganic  solute  Cj  on  the  ^-fraction  (m  is  mol/kg  H20). 

The  (^-function  of  Eq.  12  becomes  a  very  useful  modeling  parameter  if  it  is  assumed  to  be  independent  of  T; 
at  least  as  far  as  first  approximation  treatments  are  concerned.  In  this  event,  values  of  K,  y  and  r|  obtained 
under  a  known  set  of  conditions,  such  as  298  K  and  1  atm  PQi  in  Table  2,  may  be  used  under  all  other 

conditions  in  Eq.13.  An  examination  of  this  assumption,  using  limited  data  relating  to  the  effects  of  I  at 
different  T  in  H2S04,  KOH,  NaOH,  NaCl  and  CuS04  solutions,  suggests  it  is  reasonably  justified  [3],  An 
example  is  shown  for  CuS04  solutions  [3]  in  Fig.  3  where  predicted  behaviors  from  Eq.  13  and  Table  2 
compare  favorably  with  experimental  data  obtained  from  the  studies  of  Bruhn  et  al.  [17].  Hence,  Eq.  13 
becomes  the  unifying  model  equation  for  describing  oxygen  solubility  behavior  in  the  presence  of  a  single 
I,  where  k  is  the  only  T-dependent  parameter. 


415 


Fig.  3.  Measured  (experimental)  and  predicted  oxygen  solubility  (Eq.  13)  for  single/,  CUSO4 


MULTIPLE  INORGANIC  SOLUTES  (DISSOCIATED) 

Multiple  /  is  the  more  common  situation  for  industrial  oxygenated  solutions.  In  the  presence  ofz  different 

solutes  /,,  /2, . 4,  with  ^-factors  <j>i,  <t>2, . <j)2,  arranged  so  that  <j)i«>2< . <t>2,  the  overall  effective  value  <t>efr 

will  be  dominated  by  <[>].  The  remaining  factors  will  then  exert  a  multiplying  effect  on  <|>i  to  produce  a  4>efr- 
Under  these  circumstances,  <|)eff  may  be  reasonably  represented  by  a  function  such  as  Eq.14  [3], 

^efr =<th^2  rr^'  j 


where  represents  the  product  <J)2  X(j>3  x.....<S>2  and  q  is  an  empirical  exponent,  1><?>0,  that  has  been 

shown  to  have  a  value  close  to  0.8  [3], 

After  substituting  <J)eir  for  <J>  in  Eq.  11,  and  q  =  0.8,  the  solubility  model  for  multiple  /  becomes 

✓  n0.8 

(caq)l  =Po2tye(t  =  ^1 2  J 

where  k  is  given  by  Eq.  8. 


The  predictions  of  Eq.15  compare  satisfactorily  with  oxygen  solubility  measured  by  Cramer  [18]  in  a  brine 
solution  in  Fig.  3(a)  and  with  measurements  by  Hayduk  [9]  in  two  H2S04/ZnS04  solutions  characteristic  of 
Zn  pressure  leaching  in  Fig  3(b).  Cramer^  [18]  measurements  in  pure  water  are  included  in  Fig.3(a)  and 
compare  very  favorably  with  with  predicted  values  (Eq.  1 0).  Cramerfc  data  are  a  good  test  of  the  model 
because  Pq2  ranged  between  42.4  and  51.3  atm. 


(a) 


T{  K) 


(b) 


Figure  3.  Measured  (experimental)  and  predicted  oxygen  solubility  (Equation  15)  for  multiple/. 


416 


UNDISSOCIATED  INORGANIC  SOLUTES  (MIXED  SOLVENT  EFFECT) 

Ammoniacal  sulfate  solutions  are  used  for  oxygen  pressure  leaching  of  Ni-based  sulfide  ores  [19].  They 
contain  undissociated  ‘free  ammonia”  (NH  3)aq.  Ammonia  has  a  dipole  moment  and  is  a  polar  solvent  in  the 
liquid  state,  analogous  to  water  [16,20].  Thus  (NH3)aq  and  H20  molecules  are  expected  to  behave  similarly 
and  produce  a  mixed  oxygen  solvent  effect.  Hence  oxygen  solubility  in  simple  aqueous  ammonia  solutions 
(“NH4OH”),  where  the  dissolved  species  is  almost  entirely  ‘free  ammonia”  (NH  3)aq  and  not  the  NHj  ion 
[16],  should  increase  with  ammonia  concentration.  Analysis  [16]  of  oxygen  data  for  “NH4OH”  solutions 
[17,21]  between  298  and  423  K  showed  that  4>  increased  linearly  with  the  molal  concentration  of  (NH3)aq  at 
constant  oxygen  pressure,  as  indicated  in  Figure  2(c), 

^nh.  =1  +  Cnh3  (®-0105)  16. 

Thus,  in  a  solution  containing  z  dissociated  1  solutes  and  ‘free  ammonia”  (NH  3)aq,  the  overall  final  value  of 
the  (|>-factor  <()f  is  best  represented  by  <|)f  =  <}>NH,  x  <j)eff  ,  where  <j>efr  is  calculated  from  the  z  solutes  (Eq.  14). 

Hence,  from  Eq.  15,  oxygen  solubility  (ca?)/+NHj  in  the  presence  of  (NH3)aq  and  dissociated /becomes, 

(Caq)l+ NH3  =  P0l  =  ^02^NH3<t)efT  =  -^O,  ^:<})>sIH3  4*1  f  2J^[(t)i  I  17 

Using  a  representative  leaching  solution  containing  0.876  m  NiS04,  0.108  m  CuS04,  0.108  m  CuS04,  1.752 
m  (NH4)2S04  and  3.84  m  (NH3)aq  [18],  together  with  Eqs.  14  and  16,  and  Table  2,  it  is  seen  that  4>f  is  0.447 
(i.e.  1.04  x  0.43),  which  may  then  be  inserted  in  Eq.  17  to  obtain  the  estimated  oxygen  solubility. 

CONCLUSION 

A  unified  model  has  been  developed  to  estimate  oxygen  solubility  in  simple  and  relatively  complex 
industrial  solutions  where  the  oxidizing  characteristics  of  dissolved  oxygen  are  important. 

REFERENCES 

1.  Oxygen  and  Ozone,  1981,  R.  Battino  (Ed.),IUPAC  Solubility  Data  Series,  7,  Pergamon,  Elmsford,  NY. 

2.  D.  Tromans,  1998,  Hydrometallurgy,  48,  327-342. 

3.  D.  Tromans,  1998,  Hydrometallurgy,  50,  279-296. 

4.  International  Critical  Tables  of  Numerical  Data,  Physics ,  Chemistry  and  Technology,  1928,  E.W. 
Washburn  (Ed.),  Vol.  Ill,  National  Research  Council,  McGraw  Hill,  New  York,  45-171 . 

5.  J.P.  Hoare,  in  Standard  Potentials  in  Aqueous  Solutions,  1985,  A.J.  Bard,  R.  Parsons  and  J.  Jordan 
(Eds.),  IUPAC,  Marcel  Dekker,  New  York,  49-66. 

6.  H.M.  Spencer,  1945,  J.  Am.  Chem.  Soc.,  67,  1859-1860. 

7.  W.  Wagner  and  K.M.  de  Reuck,  Oxygen,  IUPAC  International  Thermodynamic  Tables  of  the  Fluid 
State-9,  1987,  Blackwell,  Oxford,  UK,  84-85. 

8.  S.  Glasstone,  K.  Laidler  and  H.  Eyring,  1941  Theory  of  Rate  Processes,  McGraw-Hill,  New  York,  504- 
505,  516-525. 

9.  W.  Flayduk,  1991,  Final  Report  Concerning  the  Solubility  of  Oxygen  in  Sulfuric  Acid-Zinc  Pressure 
Leaching  Solutions,  DDS  Contract  #UP  23440-8-9073/01 -SS,  Dept.  Chem.  Eng.,  Univ.  of  Ottawa,  ON. 

10.  H.A.  Pray,  C.E.  Schweickwert  and  B.H.  Minnich,  1952,  Ind.  Eng.  Chem.,  44,  1 146-1151. 

11.  A  Broden  and  R.  Simonson,  1978,  Sven.  Papperstidning,  81,  541-544. 

12.  E.I.  Stephan,  N.S.  Hatfield,  R.S.  Peoples  and  H.A.  Pray,  1956,  Rep.  BMI  1067,  Battelle  Memorial  Inst. 

13.  R.  Battino,  1981,  in  Oxygen  and  Ozone,  R.  Battino  (Ed.),  IUPAC  Solubility  Data  Series,  Vol. 7, 
Pergamon,  Elmsford,  NY,  1-5. 

14.  B.B.  Benson,  D.  Krause  and  M.A.  Peterson,  1970,  J.  Soln.  Chem.,  8,  655-690. 

15.  D.  Tromans,  1998,  submitted  to  Tappi  J. 

16.  D.  Tromans,  1999,  submitted  to  Hydrometallurgy. 

17.  G.  Bruhn,  J.  Gerlach  and  F.  Pawlek,  1965,  Z.  Anorg.  Allgem.  Chem.,  337,  68-79. 

18.  S.D.  Cramer,  1980,  Ind.  Eng.  Chem.,  Process  Design  Dev.,  19,  300-305. 

19.  F.A.  Forward  and  V.N.  Mackiw,  1955,  J.  of  Metals,  7,  No.  3,  457-463. 

20.  D.  Nicholls,  1979,  Inorganic  Chemistry  in  Liquid  Ammonia,  Elsevier  Scientific,  Amsterdam,  1-28. 

21.  E.  Narita,  F.  Lawson  and  K.N.  Han,  1983,  Hydrometallurgy,  10, 21-37. 


417 


On  the  Oxidation  of  Steel  in  C02  and  Air 

Gity  Samadi  Hosseinali  and  Ainul  Akhtar 

Powertech  Labs  Inc. 

12388  -  88th  Avenue,  Surrey, 

British  Columbia,  Canada  V3W  7R7 


ABSTRACT 

Oxidation  of  a  low  alloy  0.3%  carbon  steel  (SAE  4130)  was  investigated  using  Scanning  Electron 
Microscopy  (SEM),  X-ray  diffraction  and  optical  metallography.  Closed  cylindrical  specimens  were 
treated  at  860°C  for  up  to  2h  in  air  and  carbon  dioxide,  under  pressure  in  the  range  of  0.1-  0.7  MPa.  At  the 
initial  stage,  wustite  grew  through  the  coalescence  of  small  clusters  of  crystallites.  The  size  of  wustite 
grains  was  dependent  on  the  kind  of  gas  and  independent  of  the  gas  pressure.  The  time  dependence  of 
wustite  grain  growth  has  been  studied.  It  was  found  that  during  the  initial  stage  of  oxidation  wustite  grains 
grow  epitaxially  and  by  grain  coalescence.  Isothermal  transformation  of  wustite  into  magnetite  (F§04)  was 
studied  in  the  temperature  range  of  550°C-610°C.  The  results  suggest  that  it  may  be  possible  to  have  the 
desired  proportion  of  wustite  and  magnetite,  which  can  be  controlled  by  the  processing  environment. 


INTRODUCTION 

It  is  well  known  that  when  a  steel  is  heated  in  flowing  air,  it  oxidizes  and  forms  a  scale  composed  of  three 
kinds  of  iron  oxide,  FeO  (wustite),  Fe304  (magnetite)  and  Fe203  (hematite).  They  are  formed  one  above  the 
other,  wustite  remaining  adjacent  to  the  metal  and  hematite  furthest  away  from  it.  The  first  oxidation  stage 
consists  of  wustite  crystal  growth  at  860°C  on  the  substrate.  If  cooled  rapidly  to  room  temperature,  wustite 
does  not  decompose  to  magnetite.  Wustite  is  converted  into  magnetite  through  a  subsequent  heat-treatment 
below  570°C.  It  is  generally  believed  that  the  wustite  to  magnetite  transition  takes  place  below  57CPC. 

The  iron  oxide  is  probably  formed  through  a  process  involving  nucleation  and  growth.  Nucleation  entails 
collision  of  gas  molecules  and  their  equilibration  on  the  substrate.  A  nucleus  may  grow  by  either  surface 
diffusion  or  through  direct  collision.  In  the  former  case,  two  or  more  neighboring  nuclei  coalesce,  so  the 
nucleus  grows  by  surface  diffusion.  The  aim  of  this  investigation  was  to  examine  the  role  of  different 
atmospheres  on  the  growth  mechanism  and  morphology  of  the  oxide.  Results  on  the  growth  of  magnetite 
and  other  higher  oxides  will  be  presented  elsewhere. 


EXPERIMENTAL 

Hollow  cylinders  of  AISI  4130  steel  were  used  in  the  oxidation  experiments.  The  dimensions  of  the 
cylinder  were  as  follows: 

Outer  diameter:  5 1mm 
Inner  diameter:  40mm 
Length:  102mm 

The  inside  of  each  cylinder  was  grit  blasted  (alumina)  to  remove  existing  scale  on  the  surface.  End  caps 
were  welded.  The  cylinders  were  sealed  off  at  room  temperature  (a  shut  off  valve  was  used)  and  then 
heated.  The  furnace  temperature  was  brought  to  860°C  before  inserting  the  cylinders  inside  the  furnace 
tube.  After  steady  state  was  achieved,  the  cylinder  was  inserted  inside  the  furnace  tube.  The  specimen 
attained  a  temperature  of  860°C  in  approximately  10  minutes. 

The  austenitizing  treatment  was  carried  out  for  different  times.  Upon  completion  of  the  austenitizing 
treatment,  the  specimen  was  quenched  in  a  water  bath.  Thereafter,  the  cylinder  was  depressurized  and  cut 
into  small  pieces  for  the  characterization  of  the  coating  by  using  SEM  (Scanning  Electron  Microscopy),  X- 
ray  diffraction  (XRD)  and  eddy  current  testing.  Cobalt  Kq  radiation  was  used  for  the  x-ray  diffraction.  The 
oxide  coating  thickness  was  measured  using  a  transverse  section  of  the  cylinder.  The  specimen  was 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


418 


mounted  and  polished  using  metallographic  preparation  techniques.  Etching  was  carried  out  in  5%Nital 
before  making  the  scale  thickness  measurement.  Figure  1  shows  the  experimental  setup. 


RESULTS 

A  large  number  of  cylinders  were  heat-treated  with  different  pressures  of  air  and  CO.  Table  1  shows  a 
summary  of  samples  oxidized  in  air.  Specimens  treated  with  C02  are  listed  in  Table  2. 


Table  1:  Summary  of  treatment  tests  carried  out  in  air. 

The  specimens  were  treated  at  860°C  for  lh  unless  stated  otherwise. 


Cylinder  identification 

Air  pressure  kPa  (psig) 

5A 

207  (30) 

8A 

276  (40)  1  h  at  880°C 

■OSEEISH 

10A 

414(60) 

14A 

621  (90) 

18A 

621  (90)  2  hrs  at  860°C 

■ESSXiSSBI 

Table2:  Summary  of  treatment  tests  carried  out  in  carbon  dioxide. 
The  specimens  were  treated  at  860°C  for  one  hour  unless  stated  otherwise. 


Cylinder  identification 

C02  pressure  kPa  (psig) 

ID 

207  (30) 

FeO 

2D 

310(45) 

FeO 

7D 

552  (80) 

FeO 

9D 

552  (80)  2  h  at  860°C 

FeO 

10D 

345  (50)  2  h  at  860°C 

FeO 

12D 

345  (50)  30  minutes  at  860°C 

FeO 

13D 

345  (50)  15  minutes  at  860°C 

FeO 

14D 

345  (50)  5  minutes  at  860°C 

- 

15D 

345  (50)  10  minutes  at  860°C 

FeO 

17D 

Atmospheric  pressure 

FeO 

19D 

345  (50) 

FeO 

419 


Examination  of  the  as  quenched  specimens 

Features  of  the  specimens  oxidized  in  air  are  presented  first  followed  by  those  obtained  in  carbon  dioxide. 

The  X-ray  diffraction  patterns  obtained  from  the  interior  surface  of  air  oxidized  cylinders  showed  onlyFeO 
and  Fe  peaks.  This  observation  suggests  that  the  FeO,  which  formed  at  860°C,  does  not  transform  into 
other  oxides  of  iron  upon  quenching.  The  intensity  of  the  FeO  (200)  reflection  was  found  to  be  dependent 
on  the  time  of  oxidation  and  the  air  pressure.  Metaliugraphic  measurements  gave  scale  thickness  values  in 
the  range  of  15-35pm.  SEM  examination  of  the  interior  surface  of  any  given  cylinder  showed  that  the 
surface  grain  size  did  not  vary  significantly  from  one  region  to  another.  The  grains  on  the  surface  of  the 
oxide  scale  were  l-5jim  in  size  and  had  an  equiaxed  structure. 

Specimens  treated  with  C02  also  produced  reflections  in  their  diffraction  patterns  which  were  characteristic 
of  only  Fe  and  FeO.  The  intensities  of  FeO  peaks  were  generally  higher  with  CO2  than  with  air. 
Metaliugraphic  measurements  revealed  that  the  coatings  with  CO>  were  thicker  than  those  obtained  with 
air.  The  coating  thickness  ranged  between  6pm  to  50pm  depending  on  the  time  of  oxidation  and  the 
pressure  of  C02.  The  wiistite  grains  were  at  least  twice  as  large  with  CO>  than  with  air. 

It  may  be  noted  from  table  2  that  6  cylinders  (10D  through  15D  and  19D)  were  treated  under  a  constant 
pressure  of  345  kPa  (50  psig).  However,  the  exposure  time  at  860°C  was  varied  between  5  minutes  and  120 
minutes.  Of  these,  the  specimen  exposed  at  860°C  for  5  minutes  (14D)  did  not  have  a  detectable  oxide 
scale  as  revealed  through  x-ray  diffraction  and  SEM  examination  of  the  interior  surface.  Metallographic 
polishing  of  a  transverse  section  of  the  cylinder  followed  by  etching  in  a  5%Nital  solution  showed  that  the 
steel  substrate  had  a  ferrite-pearl ite  microstructure.  This  is  typical  for  this  steel,  when  heated  to  a 
temperature  below  (730°C)  rather  than  to  the  desired  oxidation  temperature  of  860°C.  A  martensitic 
structure  was  observed  with  the  substrate  of  specimen  15D,  which  had  been  kept  in  the  furnace  for  10 
minutes  prior  to  being  quenched.  Figure  2  shows  a  SEM  photograph  of  the  interior  surface  of  specimen 
1 5D  that  was  exposed  to  860°C  for  1 0  minutes.  It  may  be  noted  from  figure  2  that  the  grain  size  remains 
relatively  uniform.  Figure  3  shows  the  morphology  of  the  scale  for  specimen  1 0D  which  had  been  exposed 
to  a  much  longer  period  of  120  minutes  at  860°C.  This  longer  exposure  under  345  kPa  (50  psig)  produced  a 
non  uniformity  in  the  distribution  of  the  oxide  grains  in  any  given  region.  Some  grains  were  large  while 
others  were  small,  as  seen  in  Figure  3. 


Fig.2.  Morphology  of  the  oxide  after  10  minutes  heat  treatment  under  345  kPa  CCF. 


420 


Fig.3.  Morphology  of  the  oxide  after  2  hours 
heat  treatment  under  345  kPa  C02. 

Figure  4  shows  the  time  dependence  of  scale  thickness  under  a  constant  pressure  of  345  kPa  C02. 

60 
_  50 

S-T- 

-4-> 

|  40 

o 

1  30 

I  20 

C 

■M  10 

H 

0 

0  50  100  150 


Tims  (minutes) 

Fig.4:  Time  dependence  of  the  scale  thickness.  The  atmosphere  was  345  kPa  (50  psig)  C02. 


Treatment  below  570°C 

Some  of  the  quenched  specimens  were  tempered  at  different  temperatures  (550°C-610°C)  and  time  spans. 
The  X-ray  diffraction  has  indicated  that  it  is  possible  to  obtain  a  wide  range  of  microstructures  of  the  oxide 
using  appropriate  combinations  of  time  and  temperature.  Desired  combinations  of  wustite  and  magnetite 
may  be  introduced  in  this  manner  into  an  oxide  scale.  The  kinetics  of  the  transformation  of  wustite  into 
magnetite  and  hematite,  as  well  as  the  associated  microstructural  features  will  be  discussed  elsewhere. 


421 


DISCUSSION 

This  study  made  it  possible  to  understand  the  growth  mechanism  of  wustite  and  magnetite.  The 
characteristics  of  the  oxide  grain  growth  are  similar  to  those  of  subgrain  coalescence  during  the 
recrystallization  of  cold  worked  metals  [1,  2],  In  the  recrystallization  process,  the  boundaries  between 
neighboring  subgrains  gradually  disappear.  This  process  occurs  by  the  gradual  migration  of  dislocations 
from  the  disappearing  subgrain  boundaries  into  boundaries  of  neighboring  subgrains.  The  boundary  energy 
increases  as  the  misorientation  increases  up  to  10-20°,  where  grain  boundary  structure  can  no  longer  be 
explained  simply  by  an  array  of  lattice  dislocations  [3], 

Energetically,  the  decrease  in  grain  boundary  energy  through  coalescence  of  many  grains  with  small 
misorientations  can  act  as  a  driving  force  for  elimination  of  subgrain  boundaries.  Grain  coalescence  is 
more  active  during  the  early  stages  of  oxide  grain  growth,  i.e.,  from  oxide  grain  nucleation  through 
coalescence  to  the  large  grains  in  sizes  up  to  10pm.  Due  to  the  large  boundary  area  available  during  the 
early  stages  of  oxidation,  the  driving  force  for  grain  growth  is  highest  with  respect  to  the  later  stages  of 
growth. 

When  the  oxide  scale  becomes  thicker,  grain  coalescence  does  not  occur  because  of  the  lack  of  the  low 
angle  boundaries,  which  in  turn  gives  rise  to  low  mobility  of  dislocations.  Consequently,  nucleation  of  a 
new  grain  takes  place  at  an  area  near  a  grain  boundary  [4]. 

Although  the  simple  mechanism  of  nucleation  and  growth  presented  here  accounts  for  some  of  the 
observed  features,  the  process  of  wustite  growth  is  complex.  For  instance,  the  study  has  also  shown  that 
wustite  penetrates  well  into  the  austenite  grain  boundaries.  This  and  other  features  of  the  oxide  scale 
growth  will  be  discussed  elsewhere. 

Another  feature,  which  has  been  observed  in  this  work,  is  the  difference  in  the  size  of  wustite  grains  grown 
under  different  atmospheres.  The  size  of  the  wustite  grains  was  found  to  be  larger  in  CCT  than  in  the  air 
atmosphere.  This  could  be  due  to  the  slower  oxidation  occurring  under  CO2.  Due  to  the  rapid  reaction  in 
the  air  atmosphere,  the  grains  of  FeO,  which  nucleated  within  a  single  austenite  grain,  may  contain  a  high 
density  of  lattice  defects.  Such  lattice  defects  could  impede  the  subsequent  coalescence  of  these  FeO  grains 
into  larger  ones  through  the  process  of  gradual  migration  of  dislocations  discussed  above. 

The  early  SEM  observation  of  the  tempered  specimens  showed  that  magnetite  starts  to  grow  on  the 
boundaries  of  wustite  grains.  Tempering  for  longer  times  results  in  the  conversion  of  the  wustite  grains  into 
magnetite. 


CONCLUSION 

In  the  initial  stages  of  oxidation  at  860°C  in  with  either  air  or  CQ2,  it  appears  that  discrete  wustite  nuclei 
develop  epitaxially  on  the  substrate  grains.  Once  the  oxide  nuclei  grow  laterally  and  impinge,  then  grain 
coalescence  takes  place.  Grain  coalescence  is  believed  to  be  the  major  mode  of  grain  growth.  The  size  of 
wustite  grains  remains  smaller  in  the  air  atmosphere  than  in  CO2.  This  could  be  due  to  the  rapid  oxidation 
reaction  that  takes  place  in  air. 

Magnetite  forms  on  top  of  wustite  upon  low  temperature  tempering.  The  quantity  of  magnetite  can  be 
controlled  by  varying  the  temperature  and  the  exposure  time. 


REFERENCES 

1.  H.Hu,  1962.  Trans.  Met.  Soc.  AIMC,  224,  75-84. 

2.  J.C.M.  Li,  1962.  Journal  of  Applied  Physics,  33,  2958-2965. 

3.  R.  Bresdesen,  P.  Kopfstad,  1990,  On  the  oxidation  of  iron  in  C02+CO  gas  mixtures:  1.  Scale 
morphology  and  reaction  kinetics.  Oxidation  of  Metals,  34  (5/6),  361-379. 

4.  M.  Lee,  R.A.  Rapp,  1987,  Coalescence  of  wustite  grains  during  iron  oxidation  in  a  hot-stage 
environmental  SEM.  Oxidation  of  Metals,  27(3/4),  187-197. 


422 


423 


Retardation  Effects  of  Electrolytic  Zr02  Coating  on  Hydrogen 
Embrittlement  of  AISI  430  Stainless  Steel 

I.  B.  Huang  and  S.  K.  Yen 

Institute  of  Materials  Engineering,  National  Chung  Hsing  University, 

Taichung,  40227,  Taiwan 

ABSTRACT 

Through  Devanthan  hydrogen  permeation  tests  and  a  mathematical  analysis,  the  retarding  effect  of 
electrolytic  deposition  Zr02  oxide  films  on  the  hydrogen  embrittlement  or  entry  has  been  investigated.  The 
permeation  test  has  indicated  that  the  diffusion  coefficient  without  oxide  film  Dm(  1 .46X 1 0"8cm2/s)  has  been 
reduced  to  £)Zr(8.81xlO“16c/H2  Is)  with  Zr02  oxide  film  and  the  surface  hydrogen 

concentration  C™ (2.06  X  10  5  mol/cm 3 )  has  been  reduced  to C“ ( 0.76 X 1 0  6  mol/cm3)  by  the  oxide 
film  and  a  high  concentration  ratio  constant  AT  ( 3.5  X 103 ). 


INTRODUCTION 

Measurements  of  the  diffusion  coefficients  and  permeation  rates  of  hydrogen  through  a  metal  membrane 
have  been  widely  investigated  accounting  for  not  only  a  sensitive  electrochemical  method  developed  by 
Devanathan[l]  but  also  some  mathematical  solutions  of  the  pertinent  diffusion  equation  given  by 
McBreen[2],  Kiuchi  and  McLellan[3],  and  Yen  and  Shih[4].  On  the  other  hand,  a  practical  mathematical 
solution  of  the  permeation  rate  of  hydrogen  in  a  metal  with  an  oxide  film  on  it  has  been  found  scarcely, 
since  a  time-dependent  interface  boundary  condition  of  metal/oxide  makes  the  mathematical  analysis  more 
complex.  Although  three  decades  ago  Ash,  Barrer  and  Palmer[5]  developed  a  general  treatment  of  time  lag 
for  diffusion  in  a  multiple  laminate  ABCD...  where  each  lamina  is  composed  of  different  material,  the 
concentration  ratio  constant  K  in  Henryk  law  of  two  adjacent  materials  is  hard  to  measure  independently 
and  makes  this  method  more  difficult. 

Recently,  many  results  have  shown  that  the  metal  oxide  films  have  effectively  retarded  the  hydrogen 
embrittlement  of  metals,  such  as  sputtered  layers  of  Ti02  onto  15-5  PH  stainless  steel  [7],  sputtered  layers 
of  Al203  or  Si02  onto  1 7-4  PH  stainless  steel  [8],  and  thermally  grown  oxide  film  on  sea-cure  stainless 
steel[9].  The  mechanism  on  the  effects  of  oxide  films  on  hydrogen  entry  and/or  HE  (hydrogen 
embrittlement)  has  not  been  really  identified.  Swansier  and  Bastaz[10]  indicated  that  a  low  hydrogen 
surface  adsorption  coefficient  plays  a  leading  role  in  preventing  hydrogen  entry,  while  Caskey[l  1], 

Piaggott  and  Siarkowski[12-14]  showed  very  low  diffusivities  (10-12  to  1(U17  cm2  Is)  of  hydrogen 
through  oxides.  Probably,  both  factors  contribute  to  the  ability  of  oxide  films  to  retard  the  hydrogen  entry 
or  embrittlement  but  no  confirmed  and  direct  evidence  has  shown  which  factor  is  dominant,  up  to  this  time. 
Although  some  efforts  have  been  made  to  model  the  effect  of  an  oxide  film  on  a  metal  on  hydrogen 
permeation  by  Song.  Pyun  and  Oriani[15],  they  conducted  a  very  low  ratio  of  hydrogen  concentration  in 
metal  to  that  in  oxide  (l.OlxKT6)  and  a  very  low  diffusivity  (6.0xl0-19cm2  /sec).  Probably,  this  is 
because  the  oxide  film  is  too  thin  (~2nm)  to  be  considered  as  diffusion  controlled.  In  this  study,  the 
retardation  of  hydrogen  embrittlement  was  also  found  by  electrolytic  Zr02  coating  on  AISI  430  stainless 
steel.  The  author  tried  to  utilize  permeation  measurements  developed  by  Devanathan[l],  and  a 
mathematical  analysis  by  Yen[6]  to  find  out  whether  a  low  hydrogen  surface  adsorption  coefficient,  a  low 
diffusivity  of  the  electrolytic  Zr02  coating  and/or  a  high  concentration  ratio  constant,  K  of  coating  to 
metal,  dominates  the  retardation  effect  on  hydrogen  embrittlement  directly. 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


424 


Nomenclature 

t" :  the  time  lag  in  the  metal  without  an  oxide  film 

tfr :  the  time  lag  in  the  metal  with  an  Zr02  film 

Lm  :  the  thickness  of  the  metal 

LZr :  the  thickness  of  the  oxide  film 

Dm  :  the  diffusivity  of  hydrogen  in  metal 

DZr :  the  diffusivity  of  hydrogen  in  the  Zr02  film 

C  :  the  subsurface  concentration  of  hydrogen  on  the  metal  without  an  oxide  film 
C] fr :  the  subsurface  concentration  of  hydrogen  on  the  Zr02  film 
C™  :  the  interface  concentration  of  hydrogen  in  the  metal  at  X=LZr 
C;f  :  the  interface  concentration  of  hydrogen  in  the  oxide  at  X=LZr 
J”'  :  the  hydrogen  flux  of  the  metal  without  an  oxide  film  at  steady  state 
J :  the  hydrogen  flux  of  the  metal  with  an  Zr02  film  at  steady  state 
K  :  the  concentration  ratio  constant  of  Cft'  to  C"/ 


MATHEMATICAL  ANALYSIS 

Now  consider  a  metal  with  a  thin  Zr02  film  on  its  surface  prepared  for  a  hydrogen  permeation  test. 
Before  conducting  the  mathematical  analysis,  two  assumptions  are  required,  i.e., 


1 .  The  surface  concentration  of  the  out-diffusion  side  is  always  kept  to  zero. 

2.  DZl  and  K  are  constant  and  independent  of  the  Zr02  film  thickness. 

According  to  the  analysis  of  R.Ash,  R.M.Barrer  and  D.G.Palmer[5]  and  S.K.Yen[6]: 

J2 

—  m 

and 


LZr  ,  r  4,  T' f  4  r  LZr  |  KLm 
Dz>-  J  \  DZr  6DZr  2  Dm 


1. 


2. 


After  steady  state,  the  hydrogen  concentration  should  be  linear  both  in  the  oxide  film  and  in  the  metal 
membrane  but  possibly  of  different  slopes,  as  shown  in  Fig.  1  .a.  and  Fig.  1  ,b. 


Then  the  flux  of  hydrogen  in  metal  is  determined  from: 

fittl 

j™  =  n  -L 


On  the  other  hand,  the  flux  of  hydrogen  in  metal  with  an  oxide  film  can  be  explained  by: 

CZr  /oZr1  m' 

h  W?  _  0 


j"  =D-, 


By  taking  the  ratio  of  Eq.  3.  to  Eq.  4.,  we  derive: 


3. 


4. 


5. 


There  are  still  two  unknown  constants  DZl.  and  K  in  Eq.  2.  which  cant  be  determined  by  only  one 
experiment.  However,  if  two  experiments  with  the  same  chemical  component  but  different  thickness  of 
oxide  films  are  measured,  Eq.  2.  becomes: 


425 


tf=  LZri+KDZl. 


'l  \(LZl,)2  ( Lm  ]  DZrKLmX  \  (Lml  )2  ( DZrKLm]  |  Lm 


tf  —  LZr  2  +  KD7i. 


^Yh  Lzrl)~  (  Lzr  2  ,  D7jKLm2  V  (Lml)2  (DZrKLm2  .  LZrl 


±  j 

6  2D 


In  other  words,  DZr  and  K  are  assumed  to  be  independent  of  the  thickness  of  LZr.  Combining  Eq.  2.1. 
and  Eq.  2.2.  derives  the  following: 

I  l  .  1 1  2  A 


_-b±\b~  -Aac 
-  ~ 

2  a 


where 


(Lzrl)3  (Lzr\Y 

(LZrX)\LZr2)2Lm2 

2Dm\ 

2  Dm2 

h  =  \,rj  LZl,(L,J2](LZr2)2Lml  [  (Lm2)3  LZr2(Lm2)2](LZr])2L 

Lyr]  on  on  cm  \2  n  A  /,  a-2  o n  ?/) 


2D,,,,  6.b. 


’  (A„i)3 

.6(Dml)2 

A*, 

_  <Zr  ,  LZr\(L 

C  —  1 1 

l'  Zrl  2D. 


(Lmi)2  t  ( Lnj  ^%2i  rz,L  wWf  (/-ml)3  t*1- 

Dm i  J[6(Dm2)2  D„i2  J  Zr2  2Dm2  J[6(Dml)2  - 


r7[  or  can  be  measured  by  permeation  test,  Lm, ,  and  Lm2  are  known.  Dm)  and  Dm2  can  be  determined 
by  Eq.  1.  and  LZrl  or  LZr2  can  be  measured.  Consequently,  DZr  can  be  calculated.  Substitute  the  value  of 
DZr  into  Eq.  2.1.  or  Eq.  2.2.,  K  can  also  be  determined. 


t-Zr 


ZrOj  metit 
ihinfitoi 


Fig.l.  Hydrogen  Concentration  distribution  after  Steady  State  (a)Uncoated  (b)  Coated  Specimens. 


426 


EXPERIMENTAL 

Sample  preparation 

An  AIS1  430  stainless  steel  sheet,  as  received,  was  used  as  a  substrate  for  the  ZrO~,  electrolytic  coating.  Its 
nominal  chemical  component  is  listed  in  Table  1.  The  sheet  thickness  is  0.8  mm.  The  sheet  was  cut  into 
35x35 mm.  All  specimens  were  polished  to  a  mirror  finish  with  1  \im  and  0.05um  A!203  powder,  then 


degreased  by  detergent,  further  ultrasonically  cleaned  in  deionized  water  and  acetone,  then  dried  in  air. 
_  Table  1.  The  chemical  component  of  the  AIS1  430  stainless  steel  sheet. _ 


Element 

C 

Mn 

Si 

Cr 

Ni 

P 

S 

Fe 

Wt  % 

0.045 

0.342 

0.693 

17.82 

0.18 

0.031 

0.015 

Balance 

Electrolytic  deposition  and  annealing 

The  electrolytic  deposition  of  Zr02  was  conducted  in  a  naturally-aerated  solution  of  0.03125  M 
Zr0(N03)2 ,  at  pH=2.7  and  a  constant  cathodic  potential  of  850mv  for  250  and  500s  ,  by  using  an 

EG&G  M273A  potentiostat  and  M  352  software.  The  specimen  was  the  cathode,  graphite  the  anode  and 
saturated  calomel  was  the  reference  electrode.  The  above  electrolytic  conditions  gave  the  most  efficient 
deposition  in  our  experiments.  The  specimens  with  a  Zr(OH)A  gel  coating  were  then  naturally-dried  in  air 
and  annealed  in  air  at  703K  for  120  min,  respectively. 

Hydrogen  permeation  test  and  OM  observation 

Details  of  the  cell  and  ancillary  apparatus  used  in  the  permeation  measurements  have  been  described  in 
previous  papers  [16,  17],  The  cathodic  compartment  contained  about  1  liter  of  a  solution  of  0.2N 
CH3COOH  -  0.1N  CH3C00Na3-H20,  a  constant  current  of  500A/m2  was  applied  to  the  cathodic  surface 
of  the  coated  specimen.  The  anodic  compartment  contained  about  1  liter  of  a  solution  of  0.1  N  NaOH-H20. 
The  surface  of  the  specimen  was  maintained  at  a  constant  voltage  of  50  mv  (SCE)  to  make  the  hydrogen 
concentration  on  the  surface  zero.  The  solutions  were  de-aerated  in  both  sets  of  tests  byAr.  The  cell  was 
thermostatically-controlled  at  303+  IK.  The  uncoated  specimen  was  also  measured.  The  surface 
morphology  of  the  coated  specimen  after  the  hydrogen  permeation  tests  was  observed  by  optical 
microscopy  (OM). 


RESULTS  AND  DISCUSSION 

The  permeation  flux  of  the  specimen  with  electrodeposition  Zr02  coatings  and  without  coating  were 
recorded.  By  integrating  the  flux  curve,  the  time  lag  can  be  found  as  =73200  sec,  ^'  =124600 

and  tfr  =130500  sec.  From  Eq.  1.  to  Eq.  5.,  ,  D*f  and  Cjj  can  be  calculated.  The  corresponding  data 

are  listed  in  Table  2. 


Table  2,  The  data  of  t, ,  ,  D*f 

(effective  diffusion  coefficient)  and  Cjj  (surface  hydrogen  concentration) 


Electrodeposition 

h 

(sec) 

(M) 

(xlO ~nmol  /cm2. sec) 

Df 

(XlO  -‘cm7) 

cs 

c h 

(x\0~5  mol /cm3) 

250  sec 

124600 

0.84 

2.77 

0.856 

1.52 

500  sec 

130500 

0.42 

1.38 

0.817 

0.76 

Uncoated 

73200 

1.14 

3.76 

1.46 

2.06 

The  surface  morphologies  of  specimens  for  coated  and  uncoated  after  hydrogen  diffusion  can  be  observed 
by  optical  microscope  (OM).  No  blistering  was  found  on  the  surface  of  the  coated  specimens  but  many 
blisters  were  observed  on  the  surface  of  the  uncoated  specimen  as  shown  in  Figures  2a.,  2b.  and  2c. 


Fig.  2.  OM  of  the  specimen  after  hydrogen  permeation 
a.  Zr02  coated  for  250  sec.  b.  Zr02  coated  for  500  sec.  c.  uncoated. 

The  thickness  of  the  coated  specimen,  LZr  can  be  measured  by  a  surface  profiler ,  Lm  is  already  know  as 
the  specimen  thickness.  From  Eq.  1 .  to  Eq.  6.,  Dm ,  DZr  and  K  can  be  calculated.  The  corresponding  data 
are  listed  in  Table  3. 


Table  3.  The  data  of  LZr ,  Dm ,  tfr ,  DZr  and  K  for  A1SI  430  stainless  steel  sheet  specimens 


Electrodeposition 

Time 

LZr  (cm) 

Lm  (cm) 

Dm 

(cm 2  /sec) 

tZr 

li 

(sec) 

DZr 

(cm  2  /sec) 

K 

250  sec 

6xl0~6 

8xl0~2 

1.46x  10~8 

124600 

8.81  xlO"16 

3.5x10s 

500  sec 

1.1  xl0~5 

8xl0“2 

1.46xl0“8 

130500 

From  Table  2,  C”'  (0.76x10  6  mol/cm3)  has  been  reduced  to  be  much  lower  than  that  of 

C/7  (2.06x10  5  mol/cm3)  and  that  of  the  critical  concentration  Cf  (2.3  X 10  5  mol/cm3)  for  brittle 
transgranular  fracture  type[  1 7],  with  time  lag  calculation.  Obviously,  the  retarding  effect  of  electrolytic 
Zr01  coating  on  the  hydrogen  embrittlement  of  A1SI  430  stainless  steel  is  due  to  the  reduction  of C™ 
which  is  much  lower  than  C”  and .  From  Table  3., it  is  clear  that is  much  larger  than  C™  while  Cf 
is  much  lower  than  Dm  Therefore,  the  retarding  effect  due  to  the  low  absorption  coefficient  of  oxide  film 
can  effectively  be  excluded.  For  short  duration  exposure  to  hydrogen,  the  lower  DZr  will  delay  hydrogen 
diffusion.  However,  for  long  exposure  times,  a  high  concentration  ratio  constant  K ,  which  reduces  C/7  to 
a  value  much  lower  than  the  critical  concentration  Cf,  is  the  main  factor  involved  in  the  retarding  effect. 

SUMMARY  AND  CONCLUSIONS 

Through  Devanathan  hydrogen  permeation  tests,  and  a  mathematical  analysis,  C™ ,  Dm ,  C"‘ ,  Cfr ,  CZr , 
DZr  and  K  have  been  determined.  This  novel  method  also  suggests  a  way  to  check  whether  the  low 
diffusivity  DZr ,  high  concentration  ratio  constant  K  and/or  the  low  surface  concentration  C/fr  of 
hydrogen  on  the  Zr02  film  are  dominant  in  controling  the  retarding  effect  on  hydrogen  embrittlement  and 
entry  into  the  metal.  In  this  study,  two  conclusions  are  drawn: 

1.  The  surface  morphology  of  the  uncoated  specimen  after  hydrogen  diffusion  indicates  blistering 
behavior  but  the  coated  specimen  does  not.  Obviously,  the  electrolytic  deposition  Zr02  film  on  AISI 
430  stainless  steel  has  shown  a  retarding  effect  on  hydrogen  entry  and  hydrogenembrittlement. 

2.  The  retarding  effect  on  hydrogen  entry  into  an  electrolytic  Zr02  coating  on  AISI  430  stainless  steel  is 
mainly  due  to  the  high  concentration  ratio  constant^  (3.5xl03)  which  makes  C;7  (0.76x10  6 
mol/cm3)  much  less  than  C™ ( 2.06xl0~5  mol/cm3)  and  because  of  a  low  diffusivity  DZr 
(8.81  Xl0~16  mol/cm3),  but  not  a  very  low C/f  which  is  very  high  as  4.37x10  2  mol/cm3. 


428 


ACKNOWLEDGMENT 

The  authors  are  grateful  for  the  support  of  this  research  by  the  National  Science  Council,  Republic  of 
China,  under  Research  Project  No.  36358D. 


REFERENCES 

1.  M.  A.  V.  Devanathan  and  Z.  Stachurski,  1962.  Proc.  R.  Soc. ,  Edingburgh,  Sect.  A,  270,  p.90. 

2.  J.  McBreen,  L.  Nanis,  and  W.  Beck,  1966.  J.  Electrochem.  Soc. ,  113,  p.  12 1 8. 

3.  K.  Kiuchi  and  R.B.  McLellan,  1988.  Acta  Met.  31,  p.961. 

4.  S.  K.  Yen  and  H.  C.  Shin,  1988.  J.  Electrochem.  Soc.,  135,  p.l  169. 

5.  R.  Ash,  R.  M.  Barrer  and  D.  G.  Palmer,  1965.  Brit.  J.  Appl.  Phys.,  16,  p.873. 

6.  S.K.Yen,  1999.  Retarding  Mechanism  of  Themally  Grown  Oxide  Films  on  Hydrogen  Embrittlement  of 
AISI  430  Stainless  Steel,  Materials  Chemistry  and  Physics,  accepted. 

7.  J.  G.  Nelson  and  G.  T.  Murray,  1984.  Metall.  Trans.  A,  15A,  p.597. 

8.  G.  T.  Murray,  J.  P.  Boufard,  and  D.  Briggs,  1987.  Metall.  Trans.  A,  18A,  p.  1 62. 

9.  S.  K.  Yen  and  H.  C.  Shih,  1962.  Proceeding  of  the  Annual  Conference  of  the  Chinese  Society  for 

Material  Science,  p.646. 

10.  W.  A.  Swansiger  and  R.  Bastaz,  1979.  J.  of  Nuclear  Materials,  85,  p.335. 

11.  G.  R.  Caskey,  1974.  Material  Science  and  Engineering,  14,  p.l 09. 

12.  M.  R.  Piggott  and  A.  C.  Siarkowski,  1972.  J.  Iron  and  Steel  Institute,  210,  p.901 . 

13.  M.  R.  Louthan.  Jr.  and  R.  G.  Derrick,  1975.  Corrosion  Science,  15,  p.565. 

14.  R.  A.  Strehblow  and  H.  C.  Savage,  1974.  Nuclear  Technology,  22,  p.127. 

15.  R.  H.  Song,  S.  L.  Pyun  and  R.  A.  Oriani,  J.  Electrochem.  Soc.,  137,  p.1703. 

16.  S.  K.  Yen  and  H.  C.  Shin,  1990.  J.  Electrochem.  Soc.  137,  p.2028. 

17.  S.  K.  Yen  and  Y.  C.  Tsai,  1996.  J.  Electrochem.  Soc.  143,  p.2736. 


429 


The  Effect  of  Ca  Addition  on  Viscosity  and  Electrochemical 
Properties  of  Mg-alloys  Produced  by  Casting 

Hye-Sung  Kim*,  Shuji  Hanada*  ,  Ha-guk  Jeong*,  and,  Dong-Wha  Kum** 


*Institute  of  Material  Research,  Tohoku  university,  Sendai,  980-8577,  Japan 
Tel:  81-022-215-2406  Fax:81-022-215-2116  Email:  Kim4385@.imr.tohoku.ac.jp 

**Division  of  Metals,  Korea  Institute  of  Science  and  Technology, 

P.O.  Box  131,  Chungang,  Seoul,  Korea 


ABSTRACT 

The  composition  of  different  Mg  alloys  is  known  to  affect  their  current  capacity,  potential,  and  anode 
efficiency.  Many  alloying  elements  have  been  used  in  attempts  to  improve  the  electrochemical  properties  of 
magnesium  anodes.  Significant  improvements  of  electrochemical  properties  have  been  achieved  by 
controlling  the  adverse  effects  of  impurity  elements  such  as  Fe,Ni,  Cu  with  alloying  elements. 

Out  of  many  elements,  Ca  is  considered  as  a  very  effective  element  that  can  improve  the  electrochemical 
properties  of  Mg-alloys  because  of  its  relatively  low  potential  in  comparison  with  specified  elements  such 
as  Mn,  Al,  Zn  in  high  Mn  alloys  or  AZ63  alloys  with  the  effect  of  grain  refining.  Ca  has  recently  been  used 
as  a  common  inhibitor  for  the  ignition  of  molten  Mg  alloys.  However,  the  viscosity  of  pure  Mg  is  markedly 
increased  with  increasing  Ca  content.  Ca  is  responsible  for  making  the  casting  of  Mg  alloys  from  Mg  melt 
difficult  at  desirable  pouring  temperatures. 

In  the  present  study,  the  effect  of  Ca  addition  on  the  viscosity  and  electrochemical  properties  of  Mg-Ca 
alloys  is  investigated.  Viscosity  as  well  as  electrochemical  data  will  be  correlated  with  chemical 
composition  of  impurities,  and  the  microstructural  change  before  and  after  Ca  is  added. 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


430 


431 


Bio-Compatible  Ceramics  as  Mimetic  Material 
for  Bone  Tissue  Substitution 

Zdenek  Strnad*,  Jaroslav  Sestak** 

*Laboratory  for  Glass  and  Ceramics  (LASAK),  Paplrenska  25, 
CZ- 16000  Prague  6,  Czech  Republic 
**Division  of  Solid-State  Physics, 

Institute  of  Physics  of  the  Czech  Academy  of  Sciences, 
Cukrovamicka  10,  CZ- 16253  Prague  6,  Czech  Republic 


ABSTRACT 

Bone-like  apatite  formation  on  the  surface  of  implant  is  of  key  importance  during  the  physical  and 
chemical  processes  leading  to  the  formation  of  bonds  between  the  implanted  material  and  the  newly  formed 
bone  tissue.  The  smartness  of  such  a  mimetic  process  is  likely  the  action  of  silanole  groups  (Si-OH)  which 
serve  as  the  nucleation  sites  for  the  biocompatible  interface  formation  capable  to  coexist  between  the 
original  tissue  and  the  implants  which  can  be  made  from  ceramics,  glass-ceramics,  composites  as  well  as 
certain  metals  (titanium)  respecting  the  condition  of  suitable  surface  reactivity.  Lasak  Co.Ltd.,  is  the 
leading  manufacturer  of  these  materials  in  the  Czech  Republic  and  provides  various  kinds  of  bioactive 
implants,  based  on  calcium  phosphate  ceramics,  apatite  wollastonite  glass-ceramics  and  implants  with 
hydroxyapatite  surface  coatings,  permitting  differentiated  applications  in  clinical  practice.  The  bioactive 
materials  used  as  bone  substitutes  are  all  the  subject  of  continuing  research  to  attain  biological,  mechanical 
and  chemical  properties  as  similar  as  possible  to  those  of  the  tissue  to  be  replaced  -  mimetic  materials. 
Clinical  applications  in  orthopaedics,  neurosurgery,  maxillofacial  surgety,  auricular  surgery,  dental  surgery 
and  in  other  fields  are  demonstrated. 

Keywords:  biaoctive  implants,  bone  substitutes,  glass-ceramics,  hydroxyapatite 


INTRODUCTION 

Degeneration  of  the  skeletal  system  in  time  results  in  dysfunction  of  bones,  teeth  and  joints.  Extensive  bone 
defects  left  after  the  removal  of  tumours,  infections  or  as  a  result  of  injuries  are  ideally  replaced  by 
autogenic  bone  tissue.  As  the  amount  of  this  material  for  the  patient  is  limited  and  the  use  of  allogenic  bone 
is  accompanied  by  biological,  mechanical  and  also  sociological  difficulties,  there  is  a  great  need  for 
alternate  material. 

Since  discovery  of  Bioglass  in  1971  (1),  various  kinds  ofbioactive  materials  have  been  found  and  clinically 
used.  The  uniqueness  of  surface  bioactive  materials  is  their  high  bioactivity,  opening  qualitatively  new 
application  fields,  especially  for  anchoring  of  the  implant  in  the  host  tissue,  with  practical  use  in 
orthopedics,  stomatology,  neurosurgery,  oncology,  craniofacial  surgery  and  possibly  other  fields. 

LASAK  developed  and  provides  three  basic  kinds  ofbioactive  materials,  BAS-O,  BAS-HA  and  BAS-R, 
permitting  differentiated  applications  in  clinical  practice.  More  than  7000  people  received  these  implants  as 
their  bone  substitutes  during  last  eight  years. 


BAS-O  GLASS-CERAMICS  -  BIO  ACTIVE  LONG-TERM  STABLE  IMPLANT 
MATERIAL  WITH  HIGH  MECHANICAL  STRENGTH 

BAS-O  is  an  inorganic,  polycrystalline  material  prepared  by  controlled  crystallization  of  glass,  whose  main 
components  are  CaO,  P205,  Si02  MgO,  and  A1203  During  the  crystallization  process,  the  amorphous 
material  is  converted  to  a  glass-ceramic  material  whose  main  crystalline  phases  are  apatite  and  wollastonite 
(2,3).  Controlled  crystallization  permits  not  only  controlled  phase  conversion  during  the  process,  but  also 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


432 


control  of  the  chemical  composition  and  structure  of  the  residual  glass  phase,  which  are  decisive  factors  in 
determining  the  bioactivity  of  the  final  material  (4). 

BAS-O  exhibits  extraordinary  biocompatibility,  which  has  been  demonstrated  in  many  experiments  and 
clinical  tests.  The  basic  condition  for  the  formation  of  a  bond  between  the  BAS-0  implant  and  the  living 
bone  tissue  is  the  formation  of  a  thin  layer  enriched  in  Ca  and  P  on  the  glass-ceramic  surface  as  a  result  of  a 
reaction  between  the  implant  and  body  fluids  (Fig,  1 ). 


a.  b. 

Fig.  1.  Surface  layer  formed  on  a  BAS-0  implant  after  exposure  in  a  simulated  body  fluid  for  28  days. 

a.  Cross-section  through  the  surface  layer  (SEM  lOOOx), 

b.  Content  of  elements  (P,  Ca  and  Si02)  in  the  surface  layer. 

This  layer,  which  is  initially  amorphous,  changes  in  time  to  form  a  polycrystalline  layer  of  apatite 
agglomerates,  into  which  are  incorporated  organic  components  in  the  interface  zone,  produced  by 
osteoblasts,  such  as  collagen  fibers,  with  the  formation  of  a  tight  bond  (Fig.  2). 


Fig.2:  Detail  of  the  interface  of  the  bone  tissue  with  a  Fig.3:  Photomicrograph  of  a  section  of  BAS-0  granule 
BAS-O  implant  6  months  after  implantation  implant  in  bone,  2  years  after  implantation 

(SEM).  (toluidine  blue  stain). 


433 


Thus,  the  implant  is  not  considered  to  be  a  foreign  body;  on  the  contrary,  a  strong  bond  is  formed  directly 
between  the  implant  and  the  bone  tissue,  without  an  intermediate  layer  of  soft  tissue,  in  a  time  period  of  4-8 
weeks  after  implantation  (Fig.  3). 

BAS-0  exhibits  intense  osteoconductive  properties  and  also  the  ability  to  form  bonds  between  the 
individual  BAS-0  particles  in  a  body  fluid  medium.  BAS-O  is  a  white  material  with  an  apparent  density  of 
3000-3100  kg/m3.  The  material  has  a  similar  bending  strength  to  the  cortical  bone,  170  MPA,  and 
approximately  double  the  compressive  strength,  >  400  MPa.  The  strength  of  the  junction  of  the  BAS-0 
implant  with  bone  tissue,  measured  by  the  push-out  test  (with  shear  stress)  after  implantation  for  2  months 
is  15-20  MPa. 

Clinical  application 

BAS-0  granules  and  ground  material  are  used  to  fill  cysts,  defects  left  by  injuries,  defects  left  by 
excochleation  of  benign  tumors,  and  to  reconstruct  extensive  acetabular  defects  (Fig.  4).  Compact,  wedge- 
shaped  blocks  (with  various  heights  and  surfaces)  can  be  used,  e.g.,  for  condyl  elevation.  Individually 
shaped  implants  can  be  used  in  neurosurgery  to  cover  defects  left  from  cranial  trepanation  and  as  onlays  in 
plastic  surgery.  _ _ _ _ 


BAS-O  granule 


a.  b. 

Fig.  4:  Reconstruction  of  extensive  acetabular  defects  by  bioactive  glass-ceramics  BAS-0  in  reoperation  of 
total  endoprotheses.  a.  Schematic  drawing  of  implantation,  b.  Radiogram  taken  8  months  post- 
operatively.  [5] 

The  special  shape  of  intervertebral  prostheses  have  been  developed  and  successfully  used  in  surgery  of  spin 
(Fig.  5).  In  cranio-facial  surgery,  the  material  can  be  used  as  plates,  blocks,  or  individually  shaped  implants 
to  replace  bone  defects,  rebuilding  of  orbit,  for  reconstruction  of  partial  mandibular  defects.  It  can  be  used 
to  enlarge  the  mandible  or  for  plastic  chin  and  nose  profiles. 


Fig.5:  a.  BAS-0  intervertebral  prostheses.  b.  Radiogram  taken  post-operatively  [6], 


BAS-HA  -  BIOACTIVE  NONRESORBABLE  IMPLANT  MATERIAL  BASED  ON 
HYDROXYAPATITE  CA10(P04)6(OH)2 

Hydroxyapatite  is  synthesized  from  aqueous  solutions  under  precisely  defined  pH,  temperature  and  other 
physical  parameters,  which  ensure  reproducible  preparation  of  a  highly  pure,  crystallographically  defined 
product,  which  does  not  contain  any  unwanted  calcium  phosphates.  This  product  is  further  processed  to 
yield  the  final  BAS-HA  product  with  defined  biophysical  properties.,  Its  structure  and  composition  are 
similar  to  bio-apatite,  which  is  the  main  inorganic  component  of  living  bone  tissue.  Implants  form  a  strong: 
bond  between  the  bone  tissue  and  the  implant  material  without  an  intermediate  fibrous  layer  (Fig.  6). 


Fig.6:  Direct  contact  between  the  BAS-HA  implant  and  bone  tissue,  2  months  after  implantation 

(thin-section,  toluidine  blue  stain) 

BAS-HA  exhibits  high  biocompatibility,  which  has  been  demonstrated  in  many  preclinical  and  clinical 
tests,  including  tests  of  cytotoxicity,  carcinogenesis  and  mutagenic  effects.  The  material  exhibits 
osteoconductive  properties.  After  implantation  in  the  defective  part  of  the  bone,  bone  tissue  is  newly 


435 


formed  in  the  space  between  the  granules  of  this  substance.  A  complex  of  artificial  substances  and  living 
bone  tissue  is  formed. 

BAS-HA  is  a  very  dense  ceramic  with  apparent  porosity  of  1.7  %.  The  Ca/P  molar  ratio  is  1.66.  The 
material  exhibits  a  bending  strength  of  60  MPa  and  compression  strength  of  200  MPa.  The  strength  of  the 
junction  with  the  bone  tissue  measured  by  the  push-out  test  (shear  stress)  is  equal  to  19  MPa  two  months 
after  implantation  and  29  MPa  4  months  after  implantation. 

Clinical  application 

BAS-HA  material  is  designed  for  bone  grafting  especially  at  sites  where  only  compressive  forces  are 
expected  to  act  on  the  implant.  It  can  be  used  in  dentoalveolar  surgery  to  fill  bone  defects  left  after 
extirpation  of  cysts,  surgical  extractions,  or  to  remodel  the  alveolar  ridge.  In  paradontology,  it  can  be  used 
to  treat  bone  paradontological  defects.  Bioactive  BAS-HA  material  is  also  used  for  production  of  middle 
ear  implants  (Fig.  7)  and  dental  implants  with  hydroxyapatite  coating  (Fig.  8)  [7,  8], 


Fig.7:  Middle  ear  implants  made  of  BAS-HA.  Fig.8:  Dental  implant  (Impladent) 

with  hydroxyapatite  coating. 


BAS-R  -  BIOACTIVE  RESORBABLE  IMPLANT  MATERIAL  BASED  ON 
TRICALCIUM  PHOSPHATE  CA3(P04)2 

BAS-R  is  a  surface  bioactive,  resorbable,  inorganic,  crystalline  material  based  on  tricalcium  phosphate 
(P-TCP)  .The  material  is  prepared  by  a  special  procedure  at  high  temperature  by  melting  and  controlled 
cooling  of  the  melt.  BAS-R  forms  direct  bonds  with  living  bone  tissue  without  forming  a  fibrous  interlayer. 


Fig.  9.  Gradual  resorption  of  granules  ( - )  of  BAS-R  and  simultaneous  formation  of  new  bone  tissue 

at  the  edges  of  the  granules;  8  months  after  implantation  (thin-section,  toluidine  blue  stain) 


436 


The  material  greatly  stimulates  formation  of  new  bone  tissue  and  has  osteoconductive  properties.  The 
material  gradually  disintegrates  in  the  body  as  .a  consequence  of  hydrolytic  corrosion  and  active 
phagocytosis,  accompanied  by  resorption  and  replacement  with  newly  formed  bone  tissue  (Fig.  9)  BAS-R 
is  white  in  color  and  has  an  apparent  density  of  2900-3 100  kg/m3 


Clinical  application 

It  is  designed  for  bone  replacement  where  resorption  is  required,  with  gradual  replacement  by  living  bone 
tissue.  It  is  used  in  paradentology  and  in  dentoalveolar  surgery  to  treat  bone  defects  [  1 0]  (see  Fig.  1 0). 


Fig.10:  Filling  of  bone  defects  left  by  tooth  extraction  and  extirpation  of  aradicular  cyst  a)  prior  to 
the  operation  b)  after  application  of  BAS-R 


OUTLOOK 

Today  bioceramics  are  used  in  a  broad  field  of  devices  inside  the  human  body.  This  is  mainly  due  to  their 
good  biocompatibility.  Among  the  ceramic  materials  used  for  bone  replacement,  bioactive  ceramics  appear 
particularly  promising  because  of  their  ability  to  form  stable  interface  with  living  host  tissue.  The  major 
problem  of  these  materials,  which  inhibits  their  application  on  several  types  of  implants,  is  their  poor  match 
of  mechanical  behavior  of  the  implant  with  the  tissue  to  be  replaced.  Principally,  the  coating,  composites, 
porous  structured  materials  and/or  new  resorbable  materials  are  promising  ways  for  the  next  development. 

However,  no  one  has  succeeded  todate,  in  finding  a  material,  which  fully  corresponds  to  bone  or  other 
living  parts  of  the  body.  It  is  the  task  of  a  growing  number  of  researchers  and  institutions  working  in  the 
field  of  biomaterials  to  further  improve  performance  of  these  materials.  Nature  is  still  the  better  engineer. 

REFERENCES 

1.  L.L.Hench,  R.S. Splinter,  W.S.  Allen,  1971 .  Bonding  mechanisms  at  the  interface  of  ceramic  prosthetic 
materials,  J.Biomed,Res.Symp.  2,  117. 

2.  Z.Stmad,  1986.  Glass-Ceramic  Materials/Liquid  Phase  Separation,  Nucleation  and  Crystallization  in 
Glasses,  Elsevier,  Amsterdam. 

3.  Z.Stmad,  K.Urban,  1989.  Surface  Bioactive  Glass-Ceramic  Materials,  Sklar  a  keramik,  39, 292. 

4.  Z.Stmad,  1992.  Role  of  the  Glass  Phase  in  Bioactive  glass-ceramics,  Biomaterials,  13(5),  317. 

5.  K.Urban,  P.Sponer,  1998.  Reconstruction  of  Extensive  Acetabular  Defects  by  Bioactive  Glass  Ceramics 
in  Re-operations  of  Total  Endoprostheses,  Acta  Chir.Orthop.Traum.Cech.,  65,  17. 

6.  M.Filip,  P.Veselsky,  Z.Stmad,  P.Lanlk,  1995.  The  Replacement  of  the  Intervertebral  Disc  by  Ceramic 
Prosthesis  in  Treatment  of  Degenerative  Diseases  of  the  Spine,  Acta  Chir.Ortho.Traum.  Cech.,  62,  226. 

7.  A.  Simunek,  A.  Stepanek,  V.Zabrodsky,  Z.Nathansky,  Z.Stmad,  1997.  A  3-year  Multicenter  Study  on 
Osseointegrated  Implants-  Impladent,  Quintessenz  6(3). 

8.  Z.Stmad,  J.  Stmad,  M.Psotova,  C.Povyal,  K.  Urban,  1998.  The  Osseoconductive  Ability  of 
Plasmatically  Deposited  Hydroxyapatite  and  Pure  Ti  in  Vitro  and  in  Vivo,  Quintessenz,  7,  5. 

9.  K.Urban,  Z.Stmad,  C.Povyal,  P.Sponer,  1996.  Tricalcium  phosphate  as  a  bone  tissue  substitute,  Acta 
Chir.Orthop.Traum.Cech,  63,  16. 

10.  V.Pavek,  Z.Novak,  Z.Stmad,  D.Kudmova,  1994.  Clinical  Applications  of  Bioactive  Glass-ceramics 
BAS-0  for  Filling  Cyst  Cavities  in  Stomatology,  Biomaterials,  15(5),  353. 


437 


Intelligent  Design  of  GaSb  doped  Single  Crystals 

B.  Stepanek.  J.Sestak,  J.J.Mares,  J.Kristofik,  V.Sestakova,  P.Hubik 

Institute  of  Physics,  Academy  of  Sciences  of  the  Czech  Republic,  Semiconductor 
Department,  Cukrovamicka  10,  162  53  Praha  6,  Czech  Republic, 

Email:  sestak@fzu.cz 


ABSTRACT 

Various  element  doping  of  GaSb  crystals  was  found  unsatisfactory  in  order  to  achieve  desired 
semiconductor  properties  until  another  conception  design  was  introduced  providing  a  better  intelligence 
processing.  It  was  shown  that  the  ionized  atmosphere  can  serve  as  apassivator  in  a  wider  range  of  tellurium 
concentration  and  that  the  equilibrium  between  passivated  and  active  donors  is  created  according  to  the 
ratio  of  p-  to  n-type  dopants  in  the  starting  melt  of  GaSb  crystals  highly  doped  with  tellurium.  The 
Czochralski  method  without  encapsulant  in  the  flowing  atmosphere  of  ionized  hydrogen  was  employed  and 
the  resulting  intrinsic  free  carrier  concentration  was  several  times  lower  than  that  obtained  by  using  simple 
(molecular)  hydrogen. 


INTRODUCTION 

Gallium  antimonide  (GaSb)  single  crystals  are  promising  candidates  for  a  variety  of  military  and  civil 
applications  in  the  2  -  5  and  8-14  pm  ranges,  among  others  infrared  (IR)  imaging  sensors  for  missile  and 
surveillance  systems  (so-called  focal  plane  arrays),  fire  detection  and  monitoring  environmental  pollution 
and  other  light  diodes  and  lasers.  In  comparison  with  traditional  GaAs,  InSb,  InP,  for  which  the  semi- 
insulating  as  well  as  conductive  material  is  available,  GaSb  has  a  disadvantage:  its  conductivity  that  is 
usually  very  high  due  to  the  large  amount  of  p-type  natural  defects  in  the  lattice,  which  puts  very  serious 
limits  on  the  construction  of  GaSb-based  devices.  To  develop  high-resistive  GaSb,  intensive  optimisation 
of  doping  and  growth  conditions  was  undertaken  [1,2],  Doping  by  the  following  elements  Cu,  Zn,  Cd,  In, 
Si,  Ge,  Sn,  N,  As,  S,  Te,  Mn  was  investigated  [3-6]  systematically  using  the  Czochralski  method  of  growth 
without  encapsulant  in  a  hydrogen  flowing  atmosphere  as  well  as  by  diffusion  studies  [7].  The  limit  of  the 
doping  concentration  of  each  dopant  was  measured  indicating  the  lowest  solubility  for  S,  N,  Cu,  and  the 
highest  for  In,  Ge,  Te,  As  [8].  Extensive  thermodynamic  evaluation  were  also  carried  out  and  directed  to 
analyse  some  binaries  and  ternaries,  suchGa-Sb-S  (Te,Cu),  etc.  [3-5], 

However,  in  spite  of  big  affords,  the  desired  low  conductivity  material  was  not  obtained.  Therefore  a 
special  and  rather  unique  method  has  been  developed  using  the  passivation  of  active  donors  of  n-type  (Te- 
doped)  crystals  by  protons  during  the  inherent  growth  realised  in  the  ionised  hydrogen  atmosphere 
generated  in  situ  by  deuterium  lamp  radiation  [9,10],  From  a  thermodynamic  point  of  view  the  ionized 
hydrogen  passives  a  part  of  donors  and  shifts  equilibrium  between  passivated  and  active  donors  depending 
on  the  starting  concentration  of  n-  and  p-dopants  to  the  intrinsic-like  position.  The  preparation  of  stable 
GaSb  with  sufficiently  high  resistivity  would  open  unique  perspectives  in  the  construction  of  integrated  IR- 
optoelectronic  devices  of  a  new  generation. 

A  major  drawback  of  GaSb  substrates  is  their  higher  concentration  of  residual  acceptors  which  reach,  in  the 
pure  undoped  GaSb  single  crystal,  a  value  [11]  of  about  1.7xl017  cm-3.  The  acceptors  are  identified  as 
VGaGasb  complexes  (where  V  is  the  vacancy)  with  a  double  ionized  structure  [12].  The  resulting  resistivity 
of  undoped  GaSb  single  crystals  is  about  1 02  Q.cm.  Many  researchers  have  tried  to  reduce  the  residual 
acceptor  concentration  and  to  increase  the  resistivity  ofGaSb. 

The  achieved  results  of  growth  under  the  ionized  hydrogen  atmosphere  were  sufficiently  stimulating  to 
encourage  us  to  continue  our  study  of  the  GaSb  single  crystal  preparation  with  higher  concentrations  of 
tellurium.  The  electrical  behaviours  of  such  grown  crystals  were  compared  with  the  former  measurements 
of  Te-low  doped  crystals  [13,14]. 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


438 


EXPERIMENT 

The  Czochralski  method  without  encapsulant  in  the  flowing  atmosphere  of  ionized  hydrogen  was  used  for 
the  preparation  of  GaSb  single  crystals.  The  hydrogen  atmosphere  suppressed  creation  of  an  oxide  scum  on 
the  melt  surface  [2,15].  In  vacuum  pre-synthesised  GaSb  (Ga/Sb  =  50/50.05)  from  elements  (6N  Ga  and  6N 
Sb)  acted  as  a  starting  material.  Prior  the  crystal  growth,  GaSb  polycrystalline  bowl  was  cleaned  by 
grinding  and  etching  in  an  acid  solution.  Due  to  evaporation  of  antimony  during  the  whole  growth  process, 
the  small  amount  of  antimony  (0.1  wt.%)  was  added  into  the  melt  to  prevent  a  nonstoichiometric  growth  of 
crystals.  It  was  found  that  about  lxl  0-3  mol  antimony  could  be  lost  in  the  gaseous  form  from  the  apparatus 
owing  to  the  flowing  hydrogen  atmosphere  [15]. 

The  axial  temperature  gradients  close  to  the  solidification  interface  were  about  35  K.cnr1,  which  is  very 
low  in  comparison  with  other  methods;  for  example  with  the  liquid  encapsulated  Czochralski  method 
(LEC)  the  gradients  reached  about  200  K.cnr1.  The  horizontal  gradient  on  the  solid/liquid  interface  was 
almost  flat.  The  pulling  rate  was  12  mm.h->  with  the  seed  rotation  [15]  of  about  20  -  25  rpm.  The  crystals 
were  grown  in  the  <1 1 1>  direction,  their  length  was  about  70  mm  and  diameter  of  about  25  -  30  mm  The 
growth  conditions  are  summarised  in  the  Table  1. 

The  whole  growth  procedure,  the  complete  Czochralski  apparatus  and  the  gradients  profiles  were  described 
in  detail  in  our  previous  papers  [2,6,1 1,15], 


RESULTS  AND  DISCUSSION 

The  result  of  electrical  measurements  of  undoped  GaSb  crystals  grown  by  this  method  showed  that  the  use 
of  ionized  hydrogen  caused  an  increase  of  resistivity  and  the  free  carrier  concentration.  However,  there  was 
a  certain  asymmetry  in  acceptor  and  donor  passivation  in  GaSb,  because  the  Hall  concentration  increased. 
It  seemed  that  the  residual  donors  were  passivated  more  than  the  acceptors  [9,13]. 

Table  1:  Growth  conditions  of  the  GaSb  crystals  grown  using  the  Czochralski 


method  without  encapsulant  in  a  flowing  hydrogen  atmosphere. 


GaSb  charge  weight 

~  170  g 

Component  ratio  in  charge,  wt.% 

Ga/Sb  =  50.00/50.05 

Atmosphere 

hydrogen  -  molecular  or  ionized 

Oxygen  impurity  content 

<  0. 1  ppm 

Kind  of  dopants 

almost  elementary 

Hydrogen  flow  rate 

70  cm3/hour 

Pulling  rate 

12  mm/hour 

Rotation  speed 

20  -  25  rpm 

Seed  orientation 

<1 1  l>b(Sb),  direction  to  the  melt 

Crucible 

sand-blasted  quartz 

Crucible  weight 

~  50  g 

On  the  basis  of  this  assumption,  we  attempted  to  prepare  slightly  Te-doped  GaSb  crystals  (3.12  x  1017 
atoms.cm-3  in  the  starting  melt)  using  the  same  conditions  as  mentioned  above  [13].  According  to  the 
electrical  measurements  of  these  crystals,  our  assumption  has  been  approved  (see  Table  2.).  We  believed 
that  the  donors  were  preferentially  passivated.  It  was  likely  that  an  equilibrium  between  the  passivated  and 
active  donors  in  the  GaSb  structure  was  created  depending  on  the  growth  length.  A  fraction  of  the  donors 
was  passivated,  as  seen  in  the  case  of  the  Te-doped  GaSb  single  crystals,  where  only  a  very  small  amount 
of  active  tellurium  was  incorporated  in  GaSb,  while  the  rest  of  tellurium  was  inactive.  The  free  carrier 
concentration  was  almost  the  same  along  the  whole  length  of  the  crystals  and  although  the  GaSb  bowl  was 
doped  by  a  typical  n-type  element,  such  as  tellurium,  the  whole  GaSb  crystals  were  of  p-type  conductivity. 
As  the  distribution  coefficient  (keff  =  0.32)  of  tellurium  in  GaSb  is  lower  than  1,  the  tellurium  concentration 
in  a  crystals  should  increase  during  the  growth  compensating  the  acceptors  until  the  conductivity  of  the 
main  part  of  the  crystal  bowl  became  n-type.  This  did  not  happen  during  the  growth  under  ionized 
hydrogen  atmosphere,  as  indicated  by  the  Hall  measurements.  We  assumed  that  the  amount  of  passivated 
donors  increased  with  the  donor  concentration  in  such  a  way  that  the  active  donor  concentration  remained 


439 


almost  constant  during  the  growth.  The  slightly  Te-doped  crystals  showed  a  trend  to  keep  the  concentration 
of  active  donors  equal.  In  the  case  of  highly  Te-doped  GaSb  crystals,  however,  it  may  not  be  valid. 

For  this  reason  the  highly  doped  Te-GaSb  crystals  were  grown  under  the  same  conditions  as  in  the  case  of 
the  above  mentioned  crystals.  The  concentration  (4.52  x  1018  atoms.cnr3)  of  tellurium  was  used  and  the 
comparison  of  electrical  properties  of  crystals  grown  in  molecular  hydrogen  atmosphere  and  crystals  grown 
in  ionized  hydrogen  atmosphere  was  performed.  The  results  have  shown  that  the  free  carrier  concentration 
was  also  lower  in  the  crystals  grown  under  ionized  atmosphere  than  in  the  crystals  prepared  in  molecular 
hydrogen  (see  Table  2.).  The  difference  reached  several  hundreds  per  cent  and  we  could  say  that  the 
concentration  of  active  donors  (active  tellurium  concentration)  was  several  times  lower  in  the  case  of  the 
use  of  ionized  atmosphere  than  that  for  the  molecular  hydrogen.  It  seemed  that  our  assumption  has  been 
confirmed  and  the  ionized  atmosphere  caused  a  preferred  passivation  of  donors  but  not  acceptors.  The 
concentration  of  dislocation  density  was  almost  the  same  for  GaSb  grown  in  ionized  and  molecular 
hydrogen  atmosphere  and  reached  the  range  of  0  -  100  cm2  in  both  cases  [6].  The  quantitative  evaluation 
of  the  influence  of  this  atmosphere  is  very  difficult  now  because  we  have  only  a  few  results. 

However,  from  a  thermodynamic  point  of  view,  the  ionized  hydrogen  atmosphere  passivated  only  a  part  of 
the  donors  and  created  an  equilibrium  between  passivated  and  active  donors.  The  state  of  this  equilibrium 
depends  on  the  starting  concentration  of  n-  and  p-type  dopants.  As  soon  as  the  concentration  of  p-type 
elements  is  almost  the  same  or  higher  than  that  of  n-type  dopants,  n-type  dopants  are  preferentially 
passivated,  which  meant  the  ratio  of  the  concentration  of  passivated  donors  to  the  concentration  of  active 
donors  approaches  unity.  For  better  understanding,  it  necessary  to  add,  that  in  the  case  of  heavily  Te-doped 
GaSb  crystals  (the  whole  bowl  is  n-type)  the  concentration  of  p-type  dopants  is  very  low  and  therefore  the 
concentration  of  active  donors  (n-type  dopant)  increases  along  the  growth  axis  almost  according  to  the 
distribution  coefficient  of  tellurium  (keff  =  0.32).  For  this  reason,  the  free  carrier  concentration  and  mobility 
increase  as  well  which  is  so  distinct  for  the  case  of  p-type  crystals  (see  Table  2.).  The  value  of  mobility  was 
very  closely  connected  to  the  donor  concentration  in  the  case  of  n-type  crystals. 


Table  2:  Comparison  of  electrical  properties  of  undoped  and  Te-doped  GaSb  crystals 
grown  in  an  ionized  and  molecular  hydrogen  atmosphere. 


§SSf| 

j§§f 

Part  of 
crystal 

Fraction 

[x] 

Type  of 
Conductivity 

Free  carrier 
concentration 
[cm-3] 

Mobility 

[cm2/V.s] 

Resistivity 

[£2.cm] 

Difference  in 
free  carrier 
cone.  1%] 

Undoped 

Molecular 

Hydrogen 

Km 

0.05 

P 

1.72  x  1017 

640 

0.060 

bottom 

0.95 

P 

0.96  x  10>7 

550 

0.062 

Ionized 

Hydrogen 

K39 

0.05 

P 

3.30  x  1017 

300 

0.102 

+  190 

bottom 

0.95 

P 

3.50  x  10'7 

20 

0.807 

+  365 

3.12  x  1017 

Molecular 

Hydrogen 

Km 

0.05 

P 

0.69  x  10’7 

1200 

0.0032 

bottom 

0.95 

N 

6.70  x  1017 

2300 

0.0018 

Ionized 

Hydrogen 

Km 

0.05 

P 

0.18  x  1017 

370 

0.951 

-380 

bottom 

0.95 

P 

0.23  x  10'7 

390 

0.705 

-3010 

4.52  x  10'8 

Molecular 

Hydrogen 

Km 

0.05 

N 

13.25  x  10>7 

2860 

0.0012 

bottom 

0.95 

N 

109.80  x  10'7 

3400 

0.0010 

Ionized 

Hydrogen 

Km 

0.05 

N 

7.45  x  1  O'7 

2100 

0.0028 

- 180 

bottom 

0.95 

N 

23.10  x  10'7 

2970 

0.0013 

-480 

Our  assumption  of  the  creation  of  an  equilibrium  between  passivated  and  active  donors  is  supported  by  the 
shapes  of  free  carrier  concentration  profiles  along  the  growth  axis  depending  on  the  growth  atmosphere. 
While  the  profile  of  the  concentration  changes  in  the  crystals  grown  in  a  molecular  hydrogen  atmosphere  is 
fully  described  by  the  Pfann  equation,  it  means  the  profile  shows  the  exponential  shape  (see  Figure  1.),  the 
distribution  of  free  carrier  concentration  in  the  ciystals  grown  in  an  ionized  hydrogen  atmosphere  is  almost 
flat,  it  means,  the  profile  of  the  curves  has  a  linear  character  (see  Figure  2.).  We  suppose  that  the  creating 
equilibrium  between  passivated  and  active  donors  also  balances  the  increasing  concentration  of  acceptors 
and  moderates  the  influence  of  a  distribution  coefficient.  There  are  two  factors  which  act  against  each  other 
and  their  influences  are  compensated.  For  this  reason  the  free  carrier  concentration  changes  linearly  along 
the  growth  axis  and  the  Pfann  equation  does  not  correspond  with  the  Hall  measurements. 


440 


CONCLUSION 

For  a  satisfactory  growth  of  crystals  with  the  free  carrier  concentration  lower  than  105  cm-3  (which  is  the 
desired  goal  of  scientists  working  with  GaSb  substrates)  it  is  necessary  to  deal  with  a  whole  series  of 
crystals  with  various  concentrations  of  tellurium  in  the  starting  melt.  According  to  our  preliminary 
calculation,  the  optimal  concentration  should  be  (6  -  8)  x  1017  atoms.cmr3  of  tellurium  and  such  prepared 
crystals  should  show  the  free  carrier  concentration  of  about  1014  cm-3  along  the  whole  crystal  bowl  (see  our 
predicted  profile  of  the  free  carrier  concentration  in  such  a  Te-dopedGaSb  single  crystal,  Fig.  3). 


ACKNOWLEDGEMENT 

The  study  was  carried  out  under  the  support  of  the  Grant  Agency  of  the  Czech  Republic  projects  numbered 
as  100/98/0034, 104/97/0589  and  202/99/0410. 


1.0E+18  - 

"  Crystal  Fraction  |xj 

■  T  ■  |.  -p  -|  ■  T  ■  I"  1  ■  I-  -1  -  h  -1  -  «  ■  I.  ri  h  L.  J  | - 1 

*  M 

c 

-1.0E+18  | 

ID 

-2.0E+18  - 

-3.0E+I8  - 

N 

-4.0E+18  - 

\ 

-5.0E+18  - 

\ 

-41.0E+18  . 

\ 

-7.0E+18  - 

\ 

-8.0E+18  - 

.  ■  -  -  je-Low  Doping  \ 

™“  Te-High  Doping  1 

-9.0E+18  . 

1 

Undoped 

-1.0E+19  . 

\ 

-1.1E+19  . 

[  \ 

Fig.  1.  Free  carrier  concentration  distribution  along  the  growth  axis  in  theGaSb  single  crystals  grown  using 
the  Czochralski  method  without  encapsulant  in  a  molecular  hydrogen  atmosphere. 


441 


Fig.  2.  Free  carrier  concentration  distribution  along  the  growth  axis  in  theGaSb  single  crystals  grown  using 
the  Czochralski  method  without  encapsulant  in  an  ionized  hydrogen  atmosphere. 


Fig.  3.  Predicted  distribution  of  free  carrier  concentration  along  the  growth  axis  in  theGaSb  crystals  grown 

in  ionized  hydrogen  atmosphere. 


442 


REFERENCES 

1.  B.  Stepanek,  V.  Sestakova,  J.  Sestak,  1993.  Comparative  analysis  ofGaSb  single  crystal  growth 
techniques.  Inorganic  Mater.,  29,  1071-1075. 

2.  V.  Sestakova,  B.  Stepanek,  J.  Sestak,  1996.  Various  methods  for  the  growth  ofGaSb  single  crystals.  J. 
Cryst.  Growth,  165,  159-162. 

3.  V.  Sestakova,  B.  Stepanek,  J.  Sestak,  P.  Hublk,  V.  Smld,  1993.  Thermodynamic  aspects  of  (Te,S)- 
double-doped  GaSb  crystal  growth.  Mater.  Sci.  Eng.,  B2,  14-18. 

4.  J.  Sestak,  J.  Leitner,  H.  Yokokawa,  B.  Stepanek,  1994.  Thermodynamics  and  phase  equilibria  data  in 
the  S-Ga-Sb  system  auxiliary  to  the  growth  of  doped  GaSb  single  crystals.  Thermo.  Acta,  245,  1 89-206. 

5.  J.  Sestak,  V.  Sestakova,  □.□ivkovic,  1995.  Estimation  of  activity  data  for  theGa-Sb-S  binaries 
regarding  the  doped  GaSb  semiconductor  crystals.  Pure  Appl.Chem.,  67,  1885-1889. 

6.  V.  Sestakova,  B.  Stepanek,  1995.  Doping  of  GaSb  single  crystals  with  various  elements.  J.  Cryst. 
Growth,  146,  87-91. 

7.  J.  Mimkes,  V.  Sestakova,  K.M.  Nassr,  M.  Lubbers  B.  Stepanek,  1998.  Diffusion  mobility  and  defect 
analysis  in  GaSb.  J.  Cryst.  Growth,  187,  355-362. 

8.  V.  Sestakova,  B.  Stepanek,  J.  Sestak,  1999.  Estimation  of  doping  limit  of  some  elements  in  GaSb  single 
crystals.  Proc.  of  SP1E,  Single  Crystal  Growth,  Characterisation,  and  Applications.  3724,  125-129. 

9.  V.  Sestakova,  B.  Stepanek,  J.J.  Mare§  J.  Sestak,  1996.  Decrease  in  free  carrier  concentration  in  GaSb 
crystals  using  an  ionized  hydrogen  atmosphere.  Materials  Chem.  and  Phys.,  45,  39-42. 

10. B.  Stepanek,  V.  Sestakova,  J.  Sestak,  1996.  Growth  ofGaSb  single  crystals  with  low  carrier 
concentration.  Proc.  SPIE,  Solid  State  Crystals:  Growth  and  Characterisation,  3178,  64-67. 

1 1  .B.- Stepanek,  V.  Sestakova,  1992.  Czochralski  grown  concentration  profiles  in  the  undoped  and  Te- 
doped  GaSb  single  crystals.  Thermo.  Acta,  209,  285-294. 

12. Y.J.  Van  der  Meulen,  1967.  Growth  properties  ofGaSb:  The  structure  of  the  residual  acceptor  centres. 
Phys.  Chem.  Solids,  28,  25-32. 

13.  V.  Sestakova,  B.  Stepanek,  J.J.  Mares  J-  Sestak,  1994.  Hydrogen  passivation  of  residual  acceptors  in 
GaSb  single  crystals,  J.Cryst.Growth,  140, 426-428. 

14.  B.  Stepanek,  V.  Sestakova,  J.  Sestdk,  1994.  More  progressive  technology  ofGaSb  single  crystal  growth. 
Cz.  J.  Phys,  47,  693-697. 

15  F.  Moravec,  V.  Sestakova,  B.  Stepanek,  V.  Charvat,  1989.  Crystal  growth  and  dislocation  structure  of 
gallium  antimonide.  Crystal  Res.  Technol.,  24,  275-281. 


443 


Intelligent  Image  Analysis  Applications 


444 


445 


Astronomical  Image  Processing  - 
Applications  to  Ultra  -  Faint  Imaging  of  Small,  Moving, 
Solar  System  Bodies:  Comets  and  Near-Earth-Objects 

Karen  J.  Meech 


University  of  Hawaii,  Institute  for  Astronomy 
2680  Woodlawn  Drive,  Honolulu,  96822,  HI,  USA 
Tel:  808-956-96828  Fax:  808-956-9580  Email:  meech@ifa.hawaii ,edu 


ABSTRACT 

Modem  electronic  detectors,  or  charge-coupled-devices  (CCDs)  are  being  used  on  large  optical  ground  and 
space-based  telescopes  to  image  ultra-faint  astronomical  sources,  ranging  from  small,  solar  system  bodies, 
to  diffuse  gas  and  dust  in  interstellar  medium  to  very  distant  galaxies.  Observations  of  all  these  objects  are 
challenging,  but  the  small,  solar  system  bodies  create  special  demands  on  image  processing  because  of  their 
motion  relative  to  background  objects. 

Comets  are  "dirty  snowballs"  —  archaeological  remnants  of  the  birth  of  the  solar  system  from  its  primordial 
cloud  of  gas  and  dust.  Because  many  are  stored  at  large  distances  from  the  sun  and  may  never  have  been 
significantly  heated  by  the  sun,  they  may  contain  chemical  and  physical  information  from  this  era  early  in 
our  history.  As  a  cometary  orbit  brings  it  into  the  vicinity  of  the  sun,  the  surface  volatiles  heat  up  and 
sublimate,  dragging  the  refractory  materials  from  the  surface  and  creating  the  features  we  associate  with 
comets.  It  is  when  the  comets  are  "active"  and  close  to  the  sun  (and  hence  bright)  that  the  majority  of 
observations  are  made.  However,  in  the  active  phase,  comets  undergo  physical  and  chemical  evolution 
which  makes  primordial  information  difficult  to  discern  from  the  evolutionary  changes.  Detecting  an 
inactive  comet's  nucleus  (which  may  be  only  a  few  km  in  diameter,  and  which  reflects  only  a  few  percent 
of  the  incident  sunlight)  at  very  large  distances  from  the  sun  is  an  extremely  challenging  observation  not 
only  because  of  the  faintness  of  the  object,  but  because  of  its  motion  across  the  detector  relative  to  the  field 
stars.  It  is  even  more  challenging  to  attempt  to  image  the  first  onset  or  the  cessation  of  activity  in  the  form 
of  a  very  low  surface  brightness  dust  cloud  around  the  comet.  The  distances  at  which  these  processes  occur 
give  us  information  about  the  chemistry  of  the  comet. 

In  this  paper,  techniques  for  pushing  CCD  detectors  on  ground-based  and  space-based  telescopes  to  the 
limits  of  their  imaging  capabilities  to  detect  objects  hundreds  to  thousands  of  times  fainter  than  the  dark 
night  sky  will  be  discussed.  We  will  also  discuss  automated  techniques  for  searching  the  world's  largest 
astronomical  CCD  mosaics  for  moving  objects.  The  application  of  these  techniques  to  understand  the  early 
history  of  our  solar  system,  and  toward  discovering  small  fast-moving  comets  and  asteroidal  bodies  in  the 
near-Earth  vicinity  will  be  presented. 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


446 


447 


A  High  Performance  Computing  Algorithm 
to  Improve  In-line  Holography 

Hesham  Eldeib 


Computer  and  System  Department 
Electronic  Research  Institute,  National  Research  Center 
Tahrir  street,  Dokki, Giza, Egypt 
Email:  eldeeb@,eri.sci.eg 


ABSTRACT 

A  new  holographic  algorithm  is  suggested  to  reduce  the  computing  time  by  orders  of  magnitude  in 
computer  generation  and  reconstruction  of  holograms,  that  is  called  HPCHolo  (High  Performance 
Computing  Holography).  It  is  proposed  to  speed  up  the  generation  of  hologram  by  ray-tracing  technique 
and  the  reconstruction  of  virtual  image  from  that  hologram  by  correlation  technique.  We  applied  the  new 
algorithm  for  two-dimensional  (2D)  and  three-dimensional  (3D)  objects  model  on  PowerXplorer  computer 
(Multiprocessor  message  passing  machine).  We  study  the  relation  between  the  hologram  size,  the 
resolution  of  the  reconstruced  image  and  the  Computational  time  for  both  construction  and  reconstruction 
processes  and  obtain  satisfactory  results 


INTRODUCTION 

The  concept  of  holography  was  proposed  and  demonstrated  by  GABOR  over  forty  years  ago.  The 
realization  of  this  idea  was  made  possible  by  the  advent  of  laser  light.  Since  then,  a  great  deal  of  effort  has 
been  exerted  on  the  construction  of  a  holographic  system.  A  considerable  interest  has  arisen  in  exploring 
the  possibilities  of  applying  the  holographic  principles  of  three-dimensional  (3D)  image  storage  and 
reconstruction  to  digital-computer  displays  and  to  other  applications,  e.g.,  tomography,  nuclear  magnetic 
imaging,  astronomy,  computer-aided  design  and  3D  communications.  With  such  techniques  it  is  possible  to 
generate  holograms  capable  of  displaying  3D  images  of  objects  which  never  existed  in  reality,  since  only  a 
mathematical  knowledge  of  the  object  is  necessary  . 

Most  of  the  computer-generated  holograms  (CGHs)  described  in  the  literature  are  simply  two-dimensional 
(2D)  Fourier  transforms  of  a  two-dimensional  image  plane.  Although  the  fringe  pattern  on  the  plate  is 
calculated  by  a  computer,  what  is  calculated  is  a  Fourier  transform.  In  these  holograms,  the  principal  mode 
of  reconstruction  is  Fraunhofer  diffraction,  which  limits  low-frequency  spatial  information  to  the  center  of 
the  interference  pattern  and  higher  frequency  components  to  the  edges.  Alternatively,  holograms  generated 
via  ray  tracing  distribute  information  so  that  different  areas  of  the  plate  correspond  to  different  perspectives 
of  the  object.  In  the  past,  however,  implementation  of  this  algorithm  has  not  been  trivial  since  the  quantity 
of  calculations  quickly  approached  the  limits  of  computing  technology.  Large  holograms,  especially  those 
with  sizes  suitable  for  display  applications,  are  almost  difficult  to  achieve  with  present  means  of  single 
processor  [2]. Here,  we  employ  an  efficient  and  fast  algorithm  called  HPCHolo(High  Performance 
Computing  Holography)  algorithm  to  accelerate  the  generation  of  holograms  by  ray-tracing  technique.  We 
will  use  also  the  same  algorithm  to  accelerate  the  reconstruction  of  virtual  image  from  those  holograms  by 
correlation  technique,  which  has  been  proved  to  have  better  image  quality  than  the  FFT  method  [1]. 

The  machine  used  to  generate  the  hologram  and  reconstruct  the  virtual  image  is  a  PowerXplorer  consisting 
of  8  nodes;  each  with  32  Mbytes  of  memory.  The  PowerXplorer  has  no  internal  disk  storage  and  the  only 
I/O  interface  is  through  communication  links.  It  runs  in  a  cross  development  environment,  with  a  front-end 
workstation  for  code  development,  downloading,  and  collecting  results.  The  PowerXplorer  is  an  (Multiple 
Instruction  Multiple  Data)  MIMD  parallel  computer.  Each  processor  is  an  autonomous  computer  that 
communicates  with  other  processors  via  message  passing  [3], 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


448 


HIGH  PERFORMANCE  COMPUTING  RAY-TRACING  ALGORITHM 
FOR  HOLOGRAM  GENERATION 


Ray-tracing  algorithm 

Ray-tracing  is  a  convenient  way  to  model  optical  systems  using  a  computer.  In  computer  simulations  of 
holograms  by  ray  tracing,  the  complex  amplitudes  that  reach  the  plane  of  the  photographic  film  from 
different  directions  are  summed  up. 

The  basic  equation  for  hologram  generation  by  ray  tracing  is  [4]: 


4-1 


2_^Ajexp(ikRoj) 


R 


l. 


JJ-R  J= '  -~ctj 

where  Ia  is  the  hologram  intensity  at  a  point  Ot  on  the  hologram  plane,  i,j,Wl  represent  the 
coordinates  of  the  original  objects  in  3-D  space  or  the  virtual  image  in  3D  space,  A  is  the  reflected  beams 


from  the  object,  k  is  the  wave  number  of  beam,  and  Rf  is  the  distance  from  an  object  to  the 
(x,  y,  z)  hologram  plane 

R(x,  y,z£,r\)  =  [(x-t>)2+(y-r\)2  +  z2  ]1/2  2. 

where  is  the  coordinate  on  the  hologram  plane.  The  X  —  y  plane  is  parallel  to  the 

plane,  the  Z  -axis  is  chosen  to  be  perpendicular  to  the  hologram  and  the  hologram  is  set  at  Z  =  0 . 


The  term  "1"  in  Equation  1.  represents  the  reference  beam.  For  simplicity,  we  use  a  reference  beam  of 
normal  incidence,  as  in  the  in-line  holography  (GABOR-type),  in  the  following  discussion. 


HPCHoio  algorithm  for  hologram  generation 

In  this  paper  we  improve  the  above  algorithm  to  accelerate  the  calculation  of  Equation  1.  by  using 
HPCHoio  algorithm.  The  flowchart  of  the  HPCHoio  algorithm  used  to  generate  holograms  is  shown  in 
Figure  1 .  The  following  subsection  gives  a  brief  description  for  this  algorithm. 


Get  node  parameters:  The  processors  of  Parix  partition  are  arranged  as  a  2-D  or  3-D  grid.  Identification 
of  a  processor’s  own  position  within  the  partition  (processor’s  ID)  is  necessary  for  the  processor  in  order  to 
know  what  to  do.  A  set  of  global  data  kept  on  each  node  allows  this  identification. 


Link  Establishment:  The  basic  communication  facility  in  PARIX  is  virtual  links.  Virtual  links  overcome 
the  restrictions  given  by  the  presence  of  only  four  physical  links  per  processor  that  any  two  processors  to 
communicate  directly,  even  if  there  is  no  real  link  between  them. 


Reading  and  broadcasting  input  data:  The  main  difference  between  optical  holography  and  CGHs  is  that 
the  object  may  not  physically  exist.  Therefore,  before  starting  the  generation  process,  the  specification  of 
the  desired  object  should  be  defined  and  may  be  recorded  in  a  file  or  directly  fed  via  the  keyboard. 

Hologram  generation:  At  this  point  each  processor  has  its  assigned  elements.  It  will  begin  to  execute 
the  code  of  the  algorithm  at  the  same  time  with  the  other  processors  and  generate  its  assigned  part  of  the 
hologram. 


Broadcasting  and  Storing  hologram  data:  All  Processors  will  broadcast  the  output  data  through  the 
interconnection  that  already  exist  to  one  processor.  Then,  letting  this  processor  to  save  the  output  data  to  a 
file  opened  in  the  hostcomputer. 


Set-up  for 
communication 


Fig.  1.  Flowchart  of  HPCHolo  algorithm  for  hologram  generation 

HIGH  PERFORMANCE  COMPUTING  ALGORITHMS 
FOR  VIRTUAL  IMAGE  RECONSTRUCTION 

Correlation  method 

In  previous  papers[l,5]  we  proposed  a  correlation  method  to  reconstruct  the  3D  virtual  image  by  computer. 
The  method  is  based  on  the  correlation  between  a  hologram  of  a  point  source  and  a  hologram  of  real 
objects  and  has  been  proved  that  it  has  better  virtual  image  resolution  than  the  methods  which  use  Fast 
Fourier  Transforms  (FFT)  [1].  But  the  FFT  approach  is  much  faster. 

In  order  to  retrieve  the  3D  pattern  A(x,  y,z),  we  perform  the  calculation 

<K=X7acosl (kRia)/Ria,  3. 

a 

where  the  summation  over  OC  is  performed  on  the  surface  of  the  hologram.  This  process  can  retrieve  the 
image,  that  is,  A(x,y,z)~(Pi.[5] 

HPCHolo  algorithm  for  reconstruction  of  virtual  image 

In  this  paper,  we  improve  on  the  above  algorithm  to  accelerate  the  calculation  of  Equation  3  by  using  the 
HPCHolo  algorithm.  The  flowchart  of  the  HPCHolo  algorithm  used  to  reconstruct  the  virtual  image  is  the 
same  as  shown  in  Figure  1,  except  for  the  step  of  virtual  image  reconstruction  instead  of  hologram 
generation. 


450 


In  virtual  image  reconstruction,  each  processor  has  its  assigned  elements.  It  begins  to  execute  the  code  of 
the  algorithm  simultaneously  with  other  processors  to  reconstruct  its  assigned  part  of  the  virtual  image. 
Then  the  output  data  is  broadcast  to  one  processor  at  the  same  time.  Finally,  the  algorithm  stores  this  virtual 
image  data  to  a  file  on  the  host  computer. 


EXPERIMENTAL  VERIFICATION  OF  THE  ALGORITHM 

Hologram  generation 

In  2D  case  studies,  we  take  a  discrete  circle  consisting  of  20  points  in  space.  In  3D  case  studies,  we  take  an 
object  consisting  of  five  circles  situated  in  different  planes  at  depth  to  form  a  cone  shape.  The  radii  are  10, 
30,  50,  70,  90  pixels,  with  the  smallest  circle  farthest  away  from  the  hologram  plan.  We  generate  many 
holograms  for  these  two  objects  of  (64x64), (128xl28),(256x256), (512x512), and  (1024x1024)  pixels  size. 

We  found  that  the  fringe  patterns  have  more  diffraction  details  with  an  increase  in  hologram  size.  We  will 
show  the  response  of  this  effect  in  the  reconstructed  image  in  the  next  subsection. 

Reconstruction  of  the  virtual  image 

We  have  reconstructed  the  holograms  obtained  for  both  the  discrete  circle  and  the  3D  object.  We  take  the 
reconstruction  area  fixed  as  64x64  for  2D  objects  and  256x256  for  3D  objects.  Figure  3  shows  the 
reconstructed  image  for  a  discrete  circle  with  different  hologram  sizes.  Figure  4  shows  the  reconstructed 
image  for  the  3D  object  with  cross-sectional  plane  at  Z=90  with  different  hologram  sizes. 


a.  64x64  b.  128x128  c.  256x256 


Fig.  3.  3D  intensity  representation  of  virtual  image  for  the  discrete  2D  circle  with  different  hologram  sizes. 


a.  256x256  b.  512x512  c.  1024x1024 

Fig.  4.  Tthe  virtual  image  of  the  3D  object  with  different  hologram  size  at  cross  section  plane  Z=90. 


451 


From  Figure  4,  we  found  that  the  appearance  (resolution)  of  the  reconstructed  image  increases  with  an 
increase  in  hologram  size.  For  a  2D  object,  we  found  there  is  no  improvement  with  hologram  size  above  a 
resolution  of  256x256  due  to  the  simple  object  we  have  chosen. 


PERFORMANCE  EVALUATION 

In  this  section,  we  discuss  the  two  main  factors  used  to  evaluate  the  performance  of  FIPCHolo  algorithm. 
The  first  factor  of  these  is  computation  time  of  the  proposed  algorithm.  The  second  factor  is  the  resolution 
of  reconstructed  image. 

Computation  time 

Figure  5  and  Figure  6  show  the  computation  time  of  hologram  generation  and  reconstruction  for  different 
sizes  respectively  applied  on  1, 2, 4  and  8  processors,  for  the  2D  object  as  a  measurable  case. 


No.of 

Processors 


- 4 

—^—8 


Fig.  5.  Computation  time  of  HPCHolo  algorithm  for  hologram  generation 

with  different  sizes  applied  to  1,  2,  4  and  8  processors  with  a  2D  object. 


No.of 

Processors 


Fig.  6.  Computational  time  of  HPCHolo  algorithm  for  reconstruction  of  virtual 
image  with  different  sizes  applied  to  1,  2, 4  and  8  processors  with  a  2D  object. 


From  Figure  5  and  6,  we  can  see  that  the  maximum  gain  from  the  algorithm  HPCHolo  comes  with  a  size  of 
1024x1024.  Associated  with  the  speed,  the  HPCHolo  algorithm  is  slower  than  the  conventional  algorithm 
when  a  hologram  of  small  size  is  used.  However  by  increasing  the  size  of  the  hologram,  which  is  suitable 
for  real  object  applications,  the  proposed  algorithm  becomes  efficient  and  fast. 


452 


Resolution  of  the  reconstructed  image 

The  results  demonstrate  that  this  new  computer  holography  algorithm  improves  performance.  This  paper 
has  presented  an  extensive  case  study  for  evaluating  the  computer  generation  and  reconstruction  of 
holograms.  As  for  the  virtual  image  quality,  the  HPCHolo  algorithm  gives  excellent  results  for  both  types 
of  objects  in  2D  and  3D.  The  results  show  that  the  larger  the  hologram  size,  the  much  better  is  the  virtual 
image  resolution,  but,  it  is  saturated  for  a  certain  limit  (512x512)  as  in  our  case  study  of  2D  objects. 


CONCLUSION 

We  have  provided  a  high-speed  algorithm  for  computer-aided  holography  for  data  processing,  with  a 
parallel  processing  machine.  Compared  with  the  conventional  ray-tracing  algorithm,  the  HPCHolo 
algorithm  is  faster,  especially  with  large  hologram  sizes  suitable  for  display  applications,  which  are  almost 
impossible  to  achieve  using  conventional  means.  In  the  reconstruction  of  the  virtual  image,  the  HPCHolo 
algorithm  speeds  up  the  calculation  time  significantly  over  the  full  calculation  of  Equation  3. 

The  hologram  size  in  the  HPCHolo  algorithm  must  be  adapted  to  have  comparable  image  quality  with  the 
lull  calculation  scheme  and  with  a  large  decrease  in  computation  time.  In  general,  the  results  indicate  that 
this  new  computer  holography  algorithm  dramatically  decreases  the  computation  time  of  hologram 
generation,  with  much  better  virtual  image  quality  than  the  conventional  method. 


REFERENCE 

1.  Eldeib,  H.,  and  Yabe,  T.,  1996.  A  Fast  Computer  Holography  System  and  Its  Experimental  verification. 
Journal  of  Computer  Modeling  and  Simulation  in  Engineering  ,1, 25 1  -261 . 

2.  Koren,  G.,  Polack,  F.,  and  Joyeux,  D.,  1993.  Iterative  algorithms  for  twin-image  elimination  in  in-line 
holography  using  finite-support  constraints.  J..Optical  Soc.  Am.  A.,  10,  423-433. 

3.  Parsytec,  1996.  Parsytec  GC/Power  Plus,  Power  Xplorer  and  CC-Series  Techincal  data,  Parsytec, 
Technical  Report. 

4.  Stein,  A.D.,  Wang,  Z.,  and  Leigh,  J.S.,  Jr.,  1992.  Computer-generated  holograms:  a  simplified  ray¬ 
tracing  approach,  Computer  in  Physics  6, 289-292.York.Tokyo). 

5.  Yabe,  T.,  Ito  T.,  and  Okazaki,  M.,  1993.  Holography  Machine  HORN-1  for  Computer-Aided  Retrieval 
of  Virtual  three-dimensional  images,  Japanese  J.  Appl.  Phys.  32,  261-263. 


453 


Human  Face  Detection  System  by  KenzanNET 
with  Preprocess  Analyzing  Hyperspectral  Image 

Takakazu  Chashikawa  *  **,  Keizo  Fujii  *  and  Yoshlyasu  Takefuji  * 

*  Graduate  School  of  Media  and  Governance, 

Keio  University  5322  Endo,  Fujisawa  252-0816,  Japan 
**  Nittan  Co.,Ltd.,  1-11-6  Hatagaya,  Shibuya,  151-8535,  JAPAN 

ABSTRACT 

This  paper  proposes  a  neural  network  system  to  detect  human  faces.  Our  scheme  is  composed  of  a  pre- 
process  and  KenzanNET.  Preprocessing  analyzes  hyperspectral  images  by  using  a  hybrid  self-organizing 
classification  model  to  extract  skin  area  and  making  a  facial  candidate  pattern  based  on  the  extracted  skin 
area.  KenzanNET  discriminate  a  face  from  other  body  parts.  KenzanNET  is  a  kind  of  feed  forward  neural 
network  and  is  made  from  CombNET  [1]  improved  by  an  additional  learning  function.  Under  the  various 
conditions  in  terms  of  background  and  brightness  in  a  room  and  the  distance  between  people  and  camera, 
our  system  can  detect  human  face  with  76.9%  accuracy. 

INTRODUCTION 

Face  recognition  systems  by  neural  network  have  been  studied  for  a  long  time,  and  many  models  have  been 
proposed.  However,  it  is  difficult  to  develop  a  flexible  system  with  high  performance  and  a  human  ability, 
because  visual  processing  in  the  human  brain  has  not  yet  been  fully  understood.  But  even  lower  animals 
can  detect  a  target  object  with  flexibility.  It  is  well  known  that  insects  with  only  one  millionth  of  the  human 
brain  have  flexible  recognition  abilities  because  of  a  special  sensor  used  to  detect  features  of  a  target  object. 
For  example,  a  bee  can  sense  ultraviolet  emissions  to  detect  flower  nectar  and  a  mosquito  can  sense  carbon 
dioxide  to  detect  human  skin.  We  think  special  sensors  help  insects  to  detect  objects.  If  we  can  install  a 
special  sensor  to  detect  features  of  a  human  face,  the  system  may  have  the  same  flexibility  as  do  humans. 

We  developed  a  new  camera  for  human  face  detection  system  which  camera  can  capture  two  image, 
visible(color)  image  and  infrared  image.  Infrared  image  shows  thermal  patterns  of  a  human  and  is  not 
affected  easily  by  changes  in  conditions  of  the  image.  Thus,  our  system  analyzes  an  infrared  image  and  a 
visible  image  which  we  call  the  hyperspectrum  image.  Our  facial  detection  process  is  composed  of  two 
schemes,  preprocess  and  neural  network.  Preprocess  analyze  hyperspectrum  image  to  make  a  facial 
candidate  pattern  and  the  neural  network  discriminates  face  and  other  parts  of  body  about  the  facial 
candidate  pattern.  Preprocess  convert  face  detection  problem  to  pattern  (like  symbol)  recognition  problem. 

In  the  scheme  of  the  neural  network,  a  conventional  feed  forward  neural  network  with  a  back  propagation 
learning  model  is  not  so  good  because  of  local  minimum.  We  tried  to  use  CombNET  [1]  in  our  system  [2], 
The  strategy  of  CombNET  is  to  divide  the  problem  into  several  easy  problems  -  we  call  classified  learning. 
These  problems  can  be  solved  easily  by  conventional  neural  network  model.  This  version  of  our  system  has 
not  convergent  problem,  but  the  quality  of  detecting  human  face  is  not  so  good.  Because  templates  should 
be  prepared  as  a  teacher  in  advance,  but  it  is  too  difficult  in  generally. 

In  this  paper,  we  want  to  improve  our  system  by  adding  signal  teaching.  Thus,  we  propose  a  new  neural 
network  model,  KenzanNET  using  classified  learning  and  an  additional  learning  strategy.  The  classified 
learning  strategy  is  based  on  CombNET  and  additional  learning  strategy  is  based  on  refining  signal 
teaching  by  a  quantizing  vector.  KenzanNET  consists  of  a  refining  teacher-part,  a  clustering  part  and  some 
recognition  parts.  The  refining  teacher  is  added  for  additional  learning  and  is  different  fron  CombNET. 
Facial  candidate  pattern-recognition  by  including  three  types  of  information:  skin,  hair  and  background, 
are  fed  to  KenzanNET.  The  facial  candidate  pattern  includes  many  parts  of  body  such  as  face,  hand,  leg, 
and  so  on.  KenzanNET  discriminates  face  and  other  parts  of  body. 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


454 


SYSTEM  DESCRIPTION 

An  aspect  of  architecture  is  shown  in  Fig.  1.  The  pre-process  is  to  detect  human  skin  and  to  decide  facial 
candidate  patterns.  KenzanNET  is  to  recognize  facial  candidate  patterns. 


Face 


Fig.  1.  Overview  of  proposed  system 

/ 

Hardware 

In  our  system,  the  camera  is  composed  of  a  visible  camera  and  an  infrared  camera  as  shown  in  Fig.  2.  This 
camera  can  obtain  a  hyperspectral  image  in  the  same  area  simultaneously  as  shown  in  Fig.  3. 


Infrared  camera 


Visible  camera 


Half-mirror 


Fig.  2.  Hyperspectral  camera 


Preprocess 

An  objective  of  the  preprocess  is  to  form  facial  candidate  patterns.  Preprocess  is  composed  of  two  parts: 
skin  detection  part,  and  segmentation  part. 

Skin  detection  part  uses  a  hybrid  self-organization  algorithm  where  Kohonen's  self-organization  model  and 
the  maximum  neuron  model  are  combined  [3 ,4,5,6],  We  use  the  maximum  neuron  model  in  the  first  stage 
until  the  state  of  the  system  converges  to  the  local-minimum,  then  the  Kohonen’s  self-organization 
algorithm  is  used  in  order  to  obtain  the  solution  by  escaping  from  the  local  minimum  in  the  first  stage.  The 
algorithm  is  able  to  shorten  the  computation  time  without  a  burden  of  the  parameter  tuning.  The  result  of 
skin  detection  is  shown  in  Fig.  4.  White  is  assigned  to  the  extracted  skin  area. 


Segmentation  part  decides  head  candidate  patterns  based  on  skin  area  as  follows:  checking  criteria,  making 
a  pattern  including  three  parts  information  (skin,  hair,  and  background)  and  normalizing  size  into  16<12. 
The  criteria  are  the  size  of  skin,  the  ratio  of  height:width  about  skin  area,  the  symmetry  of  skin,  and  the 
amount  of  hair  and  position  from  the  centered.  Outputs  that  satisfie  the  criteria  are  fed  to  KenzanNET  as  a 
facial  candidate  pattern  (Fig.  5). 


455 


KenzanNET 

KenzanNET  uses  classified  learning  strategy  and  additional  learning  strategy.  KenzanNET  consists  of  three 
parts:  refining  teaching  part,  clustering  part  and  recognition  parts-as  show  in  Fig.  6.  Clustering  part  and 
recognition  parts  bear  Classified  learning  strategy  and  refining  teaching  part  bears  additional  learning 
strategy.  A  number  of  networks  in  recognition  parts  is  equivalent  to  codebook  vector  of  clustering  part. 
KenzanNET  is  different  from  CombNET  on  refining  teaching  part  added  and  clustering  part  using 
maximum  neural  model. 


In  refining  teaching  part,  maximum  neural  model 
(see  Appendix)  quantizes  vector  of  the  teaching 
signals  and  trained  codebook  vectors  are  used  as 
new  teaching  signals.  The  new  teaching  signals  are 
artgood  enough  for  training  neural  network.  Because 
number  of  teacher  is  reduce  and  new  teachers  still 
have  feature  of  old  teachers  statically.  In  clustering 
part,  input  data  is  divided  into  several  groups.  And 
in  the  recognition  part,  three  layered  hierarchical 
rt  network  is  prepared  for  each  group,  and  back 
propagation  scheme  is  utilized  to  train  each 
network.  Each  neural  network  doesn't  need  to  have 
so  complex  mapping  function,  so  it  is  easy  to  train. 


A  learning  process  is  as  follows.  In  clustering  part,  refined  teaching  signals  are  used  fortraing.  After 
training  the  clustering  part,  all  input  data  is  classified  into  several  groups  according  to  the  best  matching 
criteria  to  the  codebook  vector  of  neurons.  Since  the  input  data  that  contains  similar  patterns  is  assigned 
into  a  same  group,  the  output  data  is  fed  to  a  same  network  in  recognition  part.  The  trained  clustering  part 
is  shown  in  Fig.  8.  And  next  recognition  part  is  trained  for  each  sub-space  to  make  a  mapping  function 
discriminating  face  from  other  part  of  body.  And  a  decision  process  is  as  follows.  As  an  input  data  is  given 
to  the  clustering  network,  several  neurons  which  give  the  smallest  distance  between  the  vector  of  input  data 
and  the  codebook  vector  are  selected.  Then  the  input  data  is  fed  to  the  networks  in  recognition  part  which 
are  connected  to  the  selected  neurons  in  the  clustering  part.  The  output  of  the  networks  is  between  0  and  1. 
If  the  output  is  above  0.6  ,it  is  recognized  as  face.  And  others  are  recognized  as  other  part  of  body.  If  the 
output  indicates  face,  the  system  points  it's  position  with  rectangle  as  shown  in  Fig.  7. 


fffft 


P«ffi 


ClassO  JPPffPJPPplPli 

ciassi  ppppfMHNNMHMMNHM 

Class2  tiff 
Class3 

Class4  imttmv 


Fig.  8.  Result  of  classified  part 


Fig.  7.  Output  of  proposed  system 


EXPERIMENTS 

The  experimental  data  are  took  in  room  considering  various  conditions  including  background,  brightness, 
and  the  distance  between  people  and  camera.  The  number  of  test  samples  is  431 :  dark(181x),  bright(5001x), 
complex  background,  plane  background,  big  target(lm),  and  small  target(5m). 


456 


The  KenzanNET  should  be  trained  by  teaching  signals  in  advance.  First  we  refine  teaching  signals  as 
follows  condition.  We  generate  3  pattern  teaching  signals,  pattern  1:  without  refine,  pattern  2:  refine  to  34 
(false  9,  collect  25)  and  pattern  3:  refine  to  52  (false  16,  collect  36).  The  refine  procedure  is  as  follows.  We 
divide  teachers  into  collect  and  false  and  qantize  vector  of  teacher  by  using  maximum  neural  model 
according  to  pattern  1,  2,  and  3.  But  a  codebook  vector  of  class  must  be  false,  if  no  data  belong  to  it.  We 
use  codebook  vector  of  each  class  as  new  teacher.  Second,  the  classified  part  is  trained,  a  condition  of 
network  is  as  follows:  the  number  of  classes  is  25,16,9  and  4,  number  of  neighterhoods  is  reduced  by  one 
per  ten  iterations.  Finally,  the  each  networks  in  recognition  parts  are  trained.  A  condition  of  each  network  is 
as  follows:  the  learning  rate  is  1.0,  the  criterea  to  finish  the  training  procedure  is  that  the  RMS  error 
decreases  less  than  0.1,  the  number  of  training  pattern  is  16x12,  the  number  of  output  units  is  1. 

We  do  preprocess  all  test  data  and  decide  facial  candidate  pattern.  And  we  check  facial  candidate  pattern  of 
all  test  data  by  using  each  trained  KenzanNET. 


Results  and  Discussion 

Result  of  preprocess  is  showed  in  Table  1.  System  detects  skin  over  93.8%,  and  it  means  that  hyperspectral 
image  contains  feature  of  skin  that  is  not  influenced  by  change  of  condition  of  test  images,  and  the  feature 
is  extracted  in  skin  detection  part.  And  in  this  table  system  detects  402  facial  candidates,  but  other  287 
including  other  part  of  body,  such  as  hand  or  leg.  Thus,  recognition  quality  is  66.4%.  These  results  mean 
these  test  images  are  much  difficult  for  conventional  system. 


Table  1:  Result  of  pre-process 


Part 

Total 

Correct 

Result[%] 

Skin  Detection 

432 

405 

93.8 

Facial  Candidate 

432 

287* 

66.4 

*  The  Number  of  facial  candidates  is  402 


■kohonen2x2 


In  the  process  of  recognize  face,  we  compare 
the  result  of  the  KenzanNET  and  CombNET. 
Fig.  9  shows  result  of  classified  part.  This 
figure  shows  relationship  between  energy  of 
clustering  and  iteration  steps,  in  other  words  it 
shows  the  quality  of  clustering  and 
computational  time.  From  this  figure,  we  notice 
that  maximum  neuron  model  is  faster  than 
Kohonen’s  model  2  or  5  times,  but  the  quality 
is  not  as  good  as  Kohonen’s  model.  In  this 
point  it  is  important  that  whether  quality  of 
classification  is  important  factor  for  face 
detection  or  not.  Table  2  shows  relationship 
between  the  quality  of  classification  and  result 
,  of  face  detection.  In  this  table,  we  consider  the 

®  y  quality  or  classification  is  not  so  important  for 

face  detection.  It  is  observed  that  the  maximum  neural  model  is  faster  than  that  of  Kohonen’s  model  and 
can  classify  facial  candidate  as  good  as  Kohonen’s  model. 


Table  3  shows  the  quality  of  detecting  human  face.  From  the  results,  we  notice  that  when  the  number  of 
additional  data  is  increased,  the  quality  is  not  always  good  in  CombNET.  And  the  best  quality  is  76.9%  in 
CombNET  (number  of  class  16(4x4),  without  additional  data)  and  KenzanNET  (refined  to  52,  with 
additional  data48). 


457 


Table  2:  Relationship  between  Kohonen’s  model  and  maximum  neuron  model 


Class 

Different  of  Energy 

Different  of 
Result[%] 

4(2x2) 

+861.0 

+0.9 

+53328.0 

+0.3 

+23034.8 

-3.9 

+67223.8 

-0.5 

Table  3:  Quality  of  detecting  human  face 


Model 

Number  of 
Class 

Without  additional 
dataf%] 

With  additional 
data  16f%l 

With  additional 
data  48(%1 

Using  Kohonen 
model  (CombNET) 

9(3x3) 

72.2 

76.0 

16(4x4) 

76.9 

25(5x5) 

71.0 

WHKBStm 

Mttmm 

KenzanNET 
without  refine 

9(3x3) 

72.5 

73.4 

71.7 

16(4x4) 

73.0 

72.0 

71.9 

25(5x5) 

70.5 

72.0 

74.9 

KenzanNET  refined 
to  34 

9(3x3) 

72.4 

66.3 

16(4x4) 

73.3 

73.9 

64.5 

25(5x5) 

73.4 

74.3 

76.9 

KenzanNET  refined 
to  52 

9(3x3) 

67.0 

70.0 

75.3 

16(4x4) 

72.1 

70.9 

75.8 

25(5x5) 

72.4 

69.1 

73.5 

CONCLUSION 

In  this  study,  we  developed  face  detection  system  using  hyperspectral  image.  We  propose  new  neural 
model,  where  KenzanNET  improves  CombNET  in  classification  and  refining  teaching  part.  We  can 
improve  classification  time  over  2  times,  and  considering  additional  learning  -  our  system  needs  small 
memory  for  having  teaching  signals  compared  to  the  CombNET.  And  our  system  can  detect  human  face 
with  76.9%.  This  result  is  as  good  as  that  of  CombNET. 


ACKNOWLEDGMENT 

Part  of  this  study  was  work  of  the  Advanced  Software  Enrichment  Project  produced  by  INFORMATION- 
TECHNOLOGY  PROMOTION  AGENCY,  JAPAN. 


REFERENCES 

1 .  A.  Iwata,  K.  Hotta,  H.  Matsuo,  N.  Suzumura,  1991 .  A  Large  Scaled  Neural  Network  "CombNET".  Proc. 
of  IATSTED.  July. 

2.  T.  Chashiakwa,  K.  Fujii,  Y.  Ajioka,  Y.  Takefuji,  1998.  Human  Detection  from  Camera-Images  by 
CombNET.  Proc.  of  the  fourth  International  Conference  on  Engineering  Applications  of  Neural 
Networks.  .47-53,  June. 

3.  T.  Kohonen,  1993.  Physiological  interpretation  of  the  self-organizing  map  algorithm.  Neural  Networks. 
6,  895-905. 

4.  S.C.  Amartur,  D.  Piranio,  Y.  Takefuji,  1992.  Optimization  Neural  Networks  for  the  Segmentation  of 
Magnetic  Resonance  Images.  IEEE  Trans,  on  Medical  Imaging.  11(2),  215-220. 

5.  Y.  Takefuji,  K.C.  Lee,  H.  Aiso,  1992.  An  artificial  maximum  neural  network:  a  winner-take-all  neuron 
model  forcing  the  state  of  the  system  in  a  solution  domain.  Biological  Cybernetics.  67, 243-25 1 . 

6.  S.  Oka,  T.  Ogawa,  T.  Oda,  Y.  Takefuji,  1998.  A  New  Self-Organization  Classification  Algorithm  for 
Remote-Sensing  Images.  IEICETrans.Inf.  &  Sys.,  E81-D(l),  132-136. 


458 


Appendix.  Maximum  Neuron  Model  (MNM) 

In  classifying  P-dimensional  N  pixels  into  M  clusters,  MxN  neurons  are  required. 


Unm  is  the  input  to  the  nm  th  neuron,  and  Vnm  is  the  output.  Vnm  =  1  if  the  pixel  n  is  assigned  to 
cluster  m  ,  and  Vnm  =  0  otherwise.  xsk  is  the  density  of  pixel  k  in  the  s  th  image  file,  X k  the  feature 

vector  of  pixel  k ,  and  X ,  the  feature  vector  of  the  centroid  of  cluster  /  in  the  following  equation,  nl  is 
the  number  of  pixels  classified  into  cluster  / . 


Xk=(xl,x2,...,x  k\x,  = 


(  N  \ 

k=\ _ 

nl 


1. 


The  distance  between  pixel  k  and  cluster  /  based  on  the  square  of  Euclidean  measure  is  given  as  Rkl  in 
the  following  equation. 


2. 

The  objective  function  is  determined  by  the  mean  square  root  when  each  pixel  is  classified  into  suitable 
clusters  as  follows: 

N  M 

E  =  XX  V», 

*=i  /=i  3 

Generally  speaking,  the  lower  the  value  of  E ,  the  better  the  result  of  image  clustering.  The  purpose  of  this 
clustering  problem  is  to  reduce  the  value  of  E  .In  order  to  converge  to  the  optimum  solution  by  reducing 
the  value  of  E  ,the  derivatives  of  input  U  with  respect  to  time  t  are  given  by: 


A  Ukl=-Rkl 

The  output  V  of  the  maximum  neuron  is  determined  by: 

vkm(t  +  l)  =  l  if  Ukm=max[Ukl;\/:l], 

0  otherwise 

Algorithm  of  MNM 

1.  Initialize  the  input  of  neurons  U  with  uniform-random  values. 

2.  Use  the  input-output  function  of  eq.(5)  to  update  the  new  output  values. 

3.  In  each  clustering,  compute  the  centered  (or  cluster  means)  Xt  using  eq.(l). 

4.  For  each  neuron,  compute  the  value  of  R  of  eq.(2)  and  derivatives  of  eq.(4). 

5.  For  each  neuron,  update  input  U  using  the  first  order  Euler's  method: 


Vu(t  +  l)  =  U„(t)+AUu 

6.  Go  to  step  2  until  the  value  of  E  does  not  change. 


6. 


459 


Using  Image  information  and  Partial  Least  Squares  Method  to 
Estimate  Mineral  Concentrations  in  Mineral  Flotation 


it  h  hit  hick 

J.  Hatonen  ,  H.  Hyotyniemi ,  J.  Miettunen  ,  L.-E.  Carisson 


'Control  Engineering  Laboratory,  Helsinki  University  of  Technology 
P.O.  Box  5400,  FIN-02015  HUT,  Finland 
"Outokumpu  Mining  Oy,  Pyhasalmi  Mine 
P.O.  Box  51,  FIN-86801  Pyhasalmi,  Finland 
***Boliden  Mineral  AB,  Mineral  Processing  Department 
P.O.  Box  71,  SE-93681  Boliden,  Sweden 


ABSTRACT 

In  this  paper  the  possibility  of  predicting  mineral  concentrations  in  the  flotation  froth  by  the  use  of  real¬ 
time  acquired  image  data  and  Partial  Least  Squares  (PLS)  regression  method  is  investigated.  This  is  a 
straightforward  application  of  utilising  image  analysis  in  the  control  and  monitoring  of  mineral  flotation 
process.  For  several  reasons  this  approach  should  also  have  potential  as  an  industrial  application:  the  price 
of  the  measurement  unit  is  relatively  inexpensive,  and  it  will  quickly  supply  grade  estimates  and  also 
important  image  parameters  such  as  speed,  stability,  and  size  of  the  froth  bubbles.  However,  it  will 
probably  not  have  the  long  time  accuracy  of  the  conventional  on-stream  analysers.  To  test  the  methodology 
in  practise,  a  reasonable  amount  of  image  data  was  collected  together  with  froth  samples  from  one  of  the 
flotation  cells  at  the  Pyhasalmi  mine  zinc  circuit  in  Finland.  The  collected  images  were  processed  off-line 
to  extract  selected  features  from  the  images,  and  then  the  PLS  method  was  used  to  construct  a  model  to 
predict  the  zinc  concentration  in  the  froth  as  a  function  of  the  extracted  image  features. 


INTRODUCTION 

Flotation  is  one  of  the  most  difficult  and  challenging  processes  in  mineral  processing  industry.  The 
complexity  of  the  process  mainly  arises  from  the  inherently  chaotic  nature  of  the  underlying  microscopic 
phenomena.  Additional  problems  are  caused  by  the  fact  that  today's  measurement  technology  is  not  able  to 
provide  a  description  of  the  current  state  of  the  process  that  would  be  accurate  and  reliable  enough. 

Thus,  most  of  the  chemical  reagents  that  are  used  to  increase  the  efficiency  of  flotation  are  controlled  by 
the  human  operators.  The  operators  usually  determine  the  suitable  levels  of  the  reagents  by  analysing  the 
visual  appearance  of  the  froth.  Also  the  measurement  trends  of  the  on-stream  X-ray  analysers  give 
important  information,  i.e.  the  operator  can  check  whether  concentration  levels  are  increasing  or 
decreasing,  after  a  control  action  by  the  operator  has  taken  place  in  the  process. 

The  fast  development  of  information  technologies,  however,  has  made  it  possible  to  automatically  acquire 
images  of  the  froth  in  real-time  and  extract  features  from  the  froth  image  that  resemble  the  more  or  less 
heuristic  features  used  by  the  operators.  Thus,  the  limited  capacity  of  the  operator  to  monitor  cells 
continuously  (the  operator  is  usually  responsible  for  several  circuits  each  consisting  of  many  cells)  could  be 
increased  by  installing  a  video  camera  over  critical  cells  and  connecting  the  camera  to  a  computer  that  is 
able  to  process  grabbed  images  in  real-time.  This  topic  has  been  studied  extensively  in  the  mineral 
processing  community,  and  various  research  papers  have  been  published  on  the  subject  (see  [1],  [2],  and 
[3]).  However,  it  seems  that  few  applications  exist  that  would  deliver  useful  information  for  control  and 
monitoring  purposes  of  the  flotation  process. 

When  figuring  out  potential  applications  of  froth  images,  one  application  could  be  to  predict  mineral 
concentration(s)  in  the  froth  as  a  function  of  the  extracted  image  variables.  Usually  in  the  process  plants  the 
mineral  concentrations  in  the  froth  are  not  analysed.  One  reason  for  this  is  that  sampling  of  an  unevenly 
flowing  concentrate  coming  from  an  individual  cell  is  very  difficult,  and  the  concentrate  flow  from  several 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


460 


cells  has  to  be  combined,  before  reliable  sampling  can  be  achieved.  Secondly,  the  on-stream  analysers  are 
relatively  expensive.  A  plant  treating  sulphide  minerals  with  Cu,  Zn  and  Pb  will  typically  have  over  50 
flotation  cells  and  the  number  of  streams  analysed  on-line  will  be  between  12  and  20.  So  the  predicted 
mineral  concentration(s)  based  on  the  image  variables  would  offer  valuable  information,  cutting  down  the 
uncertainty  in  determining  as  well  as  implementing  a  suitable  control  strategy. 

Discussions  with  the  plant  operators  reveal  that  mainly  by  using  the  colour  information  from  the  froth,  a 
rough  estimate  of  the  main  mineral  (in  this  case  zinc)  can  be  given.  Consequently,  practical  experience 
supports  the  idea  presented  in  this  paper. 


MEASUREMENT  UNIT  AND  EXPERIMENTS 

The  measurement  unit  consisted  of  an  RGB  camera  placed  above  the  froth  in  the  first  rougher  cell.  The 
camera  was  installed  inside  a  metal  hood  to  protect  it  against  dirt.  The  geometrical  shape  of  the  hood  was 
selected  so  that  homogenous  illumination  of  the  froth  would  be  obtained  for  an  easy  processing  of  the 
images.  The  camera  was  connected  to  a  frame-grabber  inside  a  PC  so  that  the  images  could  be  saved  on 
hard  disc  and  used  for  off-line  analysis. 

The  actual  tests  were  carried  out  as  a  set  of  factorial  experiments.  In  these  experiments  the  manipulated 
process  variables  airflow,  xanthate  (collector),  copper  sulphate  (activator),  oil  (frother)  and  lime 
(depressant)  had  three  predetermined  reference  levels,  and  during  the  experiments  the  manipulated  process 
variables  were  driven  to  81  different  combinations  of  these  three  levels.  By  using  this  approach  it  was 
secured  that  both  the  froth  appearance  and  mineral  concentrations  in  the  froth  would  have  enough  variation 
for  building  reliable  regression  models,  and  at  the  same  time  useful  process  data  could  be  obtained  for 
finding  statistically  significant  dependencies  between  the  image  features  and  process  variables. 

Each  test  within  an  experiment  consisted  of  the  following  phases.  First  the  reference  values  of  the 
manipulated  process  variables  were  set  to  the  predetermined  values.  By  monitoring  different  measurement 
trends  in  the  automation  system  it  could  be  seen  when  the  process  reached  a  steady-state  condition.  After 
that  the  image  grabbing  was  switched  on  and  images  were  collected  approximately  for  10  minutes  and  at 
the  same  time  the  operator  took  manually  a  sample  from  the  froth.  The  main  reason  for  collecting  images 
over  a  longer  period  was  to  filter  out  small  variations  around  the  steady  state. 

This  same  procedure  was  repeated  for  each  test.  Finally  zinc,  copper,  iron  and  lead  content,  and  the 
percentage  of  solids  were  analysed  from  the  froth  samples  at  the  process  laboratory  of  the  concentrator 
plant.  The  experiment  design  described  above  resulted  in  approximately  3240  images  and  81  sample  points. 


EXTRACTION  OF  THE  IMAGE  FEATURES 

One  of  the  most  important  decisions  when  building  any  regression  model  is  the  selection  of  variables  that 
are  used  to  build  the  model.  Based  On  the  discussions  with  the  operators,  the  colour  of  the  froth  seemed  to 
be  the  most  important  variable  when  trying  to  give  an  estimate  of  the  zinc  concentration  in  the  froth. 
Previous  experience,  on  the  other  hand,  had  shown  that  bubble  collapse  rate  and  the  spatial  variance  of  the 
froth  speed  (heterogeneity)  can  correlate  with  the  zinc  concentration.  Thus  also  these  variables  were 
included  into  the  model  building. 


In  order  to  give  an  accurate  numerical  description  of  the  colour  of  the  froth,  the  shape  of  the  intensity 
histogram  of  each  channel  (R,  G,  and  B)  was  described  using  the  statistical  cumulants  mean,  standard 
deviation,  skewness  and  kurtosis,  as  defined  below.  To  remove  the  distorting  effect  caused  by  the  total 
reflectance  points  in  the  bubbles,  only  pixel  values  inside  a  certain  intensity  range  were  included  in  the 
calculations. 


mean  =  x  = 


1. 


standard  deviation  =  a 


2. 


461 


skewness  =  —  V 

N  M 

kurtosis  =  \  —  V 

I 


3. 


4. 


V  '  J 

The  bubble  collapse  rate  and  spatial  speed  variance  were  calculated  using  image  pairs,  where  the  sampling 
interval  between  the  two  images  is  20  milliseconds.  The  bubble  collapse  rate  measure  is  based  on  the  pixel- 
wise  difference  between  successive  images  (translation  effect  being  eliminated).  The  number  of  pixels  in 
the  difference  image  whose  value  is  over  a  certain  threshold  is  calculated,  and  this  number  is  directly  used 
as  an  indicator  for  the  bubble  collapse  rate. 


The  amount  of  translation  between  image  pairs  can  be  seen  in  the  cross-correlation  matrix  of  the  two 
images;  the  indices  of  the  maximum  element  in  the  cross-correlation  matrix  directly  give  the  translation  in 
x  and  y  directions.  The  fastest  and  the  most  robust  method  for  calculating  the  cross-correlation  matrix  is  by 
using  the  2-dimensional  Fourier  transform  (see  [4]). 


Further,  the  spatial  speed  variance  (actually  its  inverse)  is  now  defined  as  the  value  of  the  maximum 
element  in  the  cross  correlation  matrix;  this  comes  from  the  fact  that  the  more  there  is  similarity  between 
adjacent  images  (less  spatial  speed  variance),  the  higher  is  the  value  of  the  maximum  element. 

As  a  result  the  processing  of  the  images  resulted  in  14  parameters  for  each  image,  as  shown  in  Table  1. 


Table  1:  Extracted  image  variables. 


Variable  number 

Variable  name 

Variable  number 

Variable  name 

1 

Bubble  collapse  rate 

8 

Green  std 

2 

Spatial  speed  variance 

9 

Green  skewness 

3 

Red  mean 

10 

Green  kurtosis 

4 

Red  std 

11 

Blue  mean 

5 

Red  skewness 

12 

Blue  std 

6 

Red  kurtosis 

13 

Blue  skewness 

7 

Green  mean 

14 

Blue  kurtosis 

The  calculations  were  carried  out  inside  Labview  environment,  where  each  image  processing  algorithm 
was  implemented  as  a  Microsoft  Visual  C++  dll.  The  basic  functionality  provided  by  Labview  was  used  to 
read  the  images  from  the  hard  disc  and  to  save  the  numerical  results  of  the  different  image  processing 
algorithms  to  an  Excel  chart,  so  that  the  results  could  be  easily  transferred  to  other  Windows  applications. 


PARTIAL  LEAST  SQUARES  REGRESSION 

The  relationships  within  the  data  are  often  represented  using  regression  models,  where  the  output  variables 
are  expressed  in  terms  of  the  input  variables.  The  traditional  and  widely  used  multilinear  regression  (MLR) 
method  is  not  robust:  In  practical  applications,  the  variables  tend  to  be  dependent  on  each  other,  and  this 
col  linearity  of  data  can  ruin  the  regression  model  altogether.  To  enhance  the  robustness  of  regression, 
different  kinds  of  schemes  can  be  implemented;  the  basic  idea  is  to  project  the  data  onto  a  lower 
dimensional  subspace.  The  compressed  data  is  thereafter  mapped  onto  the  output  space  (see  [7]  for  details). 

One  approach  to  selecting  the  basis  vectors  of  the  internal  subspace  is  to  maximise  the  variance  of  the 
projected  data;  this  approach  results  in  principal  component  analysis  (PCA)  and  regression  (PCR)  -  see  [5] 
and  [6],  Even  though  the  collinearity  problem  can  efficiently  be  avoided  this  way,  the  regression  models 
may  not  be  good:  because  only  the  input  data  variance  is  weighted,  its  relevance  with  respect  to  the  outputs 


462 


cannot  be  guaranteed.  The  solution  to  this  dilemma  is  partial  least  squares  (PLS)  regression:  the  data  is 
projected  so  that  the  correlation  between  input  and  output  variables  is  maximised  [8], 


CONSTRUCTION  AND  VALIDATION  OF  THE  PLS  PREDICTION  MODEL 

At  the  beginning  of  the  model  building  the  data  was  divided  into  two  sets,  the  training  data  set  and  the 
validation  data  set.  The  division  was  done  randomly,  so  that  the  calibration  data  set  consisted  of  60  samples 
and  the  validation  data  set  of  2 1  samples.  The  training  data  set  was  scaled  to  zero  mean  and  unit  variance, 
because  variables  did  not  have  equal  variance. 

Because  PLS  is  inherently  a  linear  method,  the  detection  of  possible  outliers  is  an  important  step.  The 
cross-validation  approach  used  here  was  to  leave  data  samples  out  one  at  a  time,  and  construct  a  PLS  model 
using  the  rest  of  the  samples  (using  a  fixed  number  of  latent  variables,  in  this  case  three).  Finally  the 
parameters  of  the  regression  vector  were  plotted  as  functions  of  the  sample  number  that  was  left  out.  If  a 
certain  parameter  vector  differed  considerably  (checked  by  visual  inspection  only)  from  the  rest  of  the 
samples,  this  sample  was  classified  as  an  outlier  and  left  out  from  further  modelling.  In  our  case,  three 
samples  in  the  training  set  were  considered  as  outliers  and  were  left  out  from  further  model  building. 

To  maximise  the  prediction  power  of  the  PLS  method,  a  "leave-one-out"  cross-validation  method  was  used 
(see  [8])  to  determine  the  suitable  number  of  latent  variables  for  the  training  set.  The  method  suggested  that 
five  latent  variables  is  the  optimum  number  for  modelling  purposes.  The  amount  of  variation  explained  by 
each  latent  variable  in  both  X  and  Y  data  is  shown  in  Table  2. 


Table  2:  The  amount  of  variance  explained  by  different  latent  variables. 


Latent 

variable 

Explained 
variance  in  X 
block 

Explained 
variance  in  X  - 
cumulative 

Explained 
variance  in  Y 

Explained 
variance  in  Y  - 
cumulative 

1 

68.54 

68.54 

50.57 

50.57 

2 

13.47 

82.01 

7.48 

58.05 

3 

8.21 

90.22 

2.18 

60.23 

4 

6.39 

96.62 

3.08 

63.31 

5 

1.8 

98.42 

2.68 

65.98 

It  is  useful  to  note  that  five  latent  variables  are  able  to  explain  98.42%  of  the  variation  in  X  (image) 
variables,  when  the  original  dimension  of  the  data  was  14.  Thus  the  presence  of  large  collinearity  in  the 
image  data  is  evident,  and  standard  MLR  approach  would  not  have  worked. 


When  the  regression  model  has  been  built,  it  is  always  important  to  verify  that  it  corresponds  to  the  a  priori 
information  known  from  the  specific  domain.  When  PLS  approach  is  used,  the  natural  way  is  to  analyse  the 
latent  vectors;  the  latent  vectors  capture  the  correlation  structures  in  the  data. 

In  our  case  the  sphalerite  (ZnS)  mineral  is  reddish-brown,  and  so,  this  colour  tone  should  increase  in  the 
froth,  when  the  zinc  concentration  in  the  froth  increases.  Previous  correlation  analyses  carried  out  for 
different  data  sets,  on  the  other  hand,  had  shown  that  low  spatial  variance  of  the  froth  speed  and  low  bubble 
collapse  rate  correlate  with  high  zinc  concentration  and  vice  versa.  When  analysing  the  contribution  of  the 
different  latent  variables  to  explain  the  variance  in  Y  variable  (zinc  concentration),  the  first  latent  variable 
explained  over  50%  of  the  total  variance.  The  rest  of  the  latent  variables  explained  considerably  smaller 
amounts  of  variance  in  Y.  Consequently  the  first  latent  variable  seems  to  explain  a  global  relation  between 
image  variables  and  zinc  concentrations,  and  so  the  a  priori  known  relations  should  be  found  in  it. 


463 


Figure  1  reveals  that  low  bubble  collapse  rate  and  low  spatial  speed  variance  of  the  froth  are  related  to  high 
zinc  concentration  in  the  froth.  The  mean  value  of  the  red  channel  seems  to  play  a  slightly  more  important 
role  compared  to  other  channels,  which  was  expected.  In  addition  to  this,  it  seems  that  also  the  higher 
statistical  cumulants  in  each  channel  have  similar  behaviour;  this  suggests  that  intensity  distribution  of  each 
channel  becomes  flatter  and  skewed  to  the  left,  as  the  froth  carries  more  zinc  (this  has  been  further 
validated  in  [2]).  This  result  is  quite  expected,  because  usually  a  froth  having  high  zinc  concentration  tends 
to  have  smaller  bubbles,  and  as  a  consequence  there  are  more  dark  shadows  in  the  froth  compared  to  a  froth 
having  low  zinc  concentration.  Thus,  it  seems  that  the  model  has  been  able  to  capture  true  relations 
between  the  image  variables  and  the  zinc  concentration. 

To  finally  test  how  well  the  model  is  really  working,  validation  data  is  used.  Figure  2  shows  how  accurately 
the  obtained  model  can  predict  samples  in  the  validation  data  set  using  five  latent  vectors. 


Fig.  2.  The  actual  and  predicted  zinc  concentrations  in  the  validation  data  set. 

The  average  prediction  error  is  1.4%  Zn  and  the  maximum  error  is  3.4%  Zn  in  the  validation  data  set. 
These  results  can  be  considered  fairly  good,  because  the  predictions  are  made  over  a  very  wide  range  of 
zinc  concentrations.  It  also  seems  that  the  model  is  able  to  react  to  large  changes  in  the  zinc  concentration, 
but  smaller  variations  are  not  predicted  so  accurately.  The  main  reason  could  be  that  colour  measurement 
with  a  standard  RGB  camera  is  not  accurate  enough,  which  is  also  the  result  obtained  in  [9], 


464 


CONCLUSIONS  AND  FURTHER  WORK 

In  this  paper  it  has  been  presented  how  image  data  can  be  used  to  predict  mineral  concentrations  in  the 
froth  by  the  utilisation  of  a  PLS  approach.  As  a  linear  method,  PLS  results  in  a  transparent  model  that  is 
easy  to  validate  against  a  priori  information.  It  was  shown  that  PLS  is  an  easy  and  fast  method  for  building 
linear  regression  models  to  predict  mineral  concentrations  in  the  froth,  even  when  the  input  data  is  highly 
collinear.  It  turns  out  that  in  many  cases  the  linear  methods  may  be  smarter  than  the  intelligent” ones! 

PLS  also  includes  a  variety  of  different  tools  for  statistical  hypothesis  testing.  For  example,  when  new  data 
arrives,  it  can  be  checked  whether  the  model  for  A  data  can  explain  it  statistically  significantly  (so  called 
distance  to  model  approach),  and  if  not,  the  model  is  probably  no  more  valid  and  re-calibration  has  to  be 
carried  out  [8].  Thus  PLS  framework  contains  a  lot  of  additional  functionality  that  is  useful  in  practical 
modelling  and  prediction  work. 

However,  the  authors  are  aware  that  there  are  still  problems  to  be  solved,  before  the  method  can  be  used  for 
predicting  mineral  concentrations  in  the  froth  in  practise.  The  first  problem  is  the  variations  in  the  quality 
of  the  incoming  ore,  which  sometimes  means  that  also  the  colour  of  the  ore  changes.  Thus,  some  kind  of 
automatic  re-calibration  procedure  would  be  needed  in  order  to  overcome  this  problem,  resulting  in  more 
complex  and  expensive  set-up.  The  second  problem  is  to  guarantee  the  accurate  measurement  of  the  colour. 
As  earlier  said  it  seems  that  standard  RGB  camera  is  not  accurate  enough  for  measuring  the  colour  of  the 
froth  (see  [9]).  Accordingly,  a  more  accurate  measurement  method  should  be  used;  for  example,  a 
spectrophotometer  could  be  a  potential  solution.  Also  the  lamp  used  for  illumination  causes  inaccuracies  in 
colour  measurement  because  the  lamp  loses  its  power  due  to  ageing,  and  some  kind  of  reference 
measurement  should  be  used  in  order  to  eliminate  this  phenomenon. 


ACKNOWLEDGEMENTS 

The  work  described  was  financially  supported  within  the  framework  of  the  ChaCo  ( Characterisation  of 
Flotation  Froth  Structure  and  Color  by  Machine  Vision )  Esprit  Long  Term  Research  Project  N.2493 1 . 


REFERENCES 

1.  A.  Cipriano,  M.  Guarini,  R.  Vidal,  A.  Soto,  C.  Sepulveda,  D.  Mery,  H.  Briseno,  1998.  A  real  time 
visual  sensor  for  supervision  of  flotation  cells.  Minerals  Engineering,  1 1  (6),  pp.  489-499. 

2.  A.J.  Niemi,  R.  Ylinen,  H.  Hyotyniemi,  1997.  On  characterization  of  pulp  and  froth  in  cells  of  flotation 
plant.  Int.  J.  of  Mineral  Processing,  51,  pp.  51-65. 

3.  D.W.  Moolman,  C.  Aldrich,  J.S.J  van  Deventer,  D.B  Bradshaw,  1995.  The  interpretation  of  flotation 
froth  surfaces  by  using  digital  image  analysis  and  neural  networks.  Chem.  Eng.  Sci.,  50  (22):  3501— 
3513. 

4.  E.O.  Bringham,  1988.  The  Fast  Fourier  Transform  and  Its  Applications.  Prentice-Hall,  London,  UK. 

5.  A.  Basilevsky,  1994.  Statistical  Factor  Analysis  and  Related  Methods.  John  Wiley  &  Sons,  New  York. 

6.  J.  Hatonen.,  H.  Hyotyniemi,  G.  Bonifazi,  S.  Serranti,  F.  Volpe,  L.-E.  Carlsson,  1999.  Using  PCA  in 
controller  strategy  design  for  a  flotation  process.  Preprints  14*  1FAC  World  Cong.,  July  5-9,  Beijing. 

7.  H.  Hyotyniemi,  1998.  Summary  -  on  linear  multivariate  methods.  In  Multivariate  Statistical  Methods  in 
Systems  Engineering.  Helsinki  Univ.  of  Tech.,  Control  Eng.  Lab.,  Report  1 12.  available  on  the  Internet 
at  http://saato014.hut.fi/Hvotvniemi/publications/98  reportl  12.htm. 

8.  H.  A.  Martens,  1985.  Multivariable  Calibration  -  Quantitative  interpretation  of  non-selective  chemical 
data.  Doctoral  Thesis,  Technical  University  of  Norway,  Trondheim,  Norway. 

9.  A.  Siren,  1999.  The  characterisation  of  Froth  Colour  by  Machine  Vision.  Preprints  of  the  EOS/SPIE 
International  Symposia,  June  14-18,  Munich,  Germany. 


465 


A  Combined  Morphological  and  Color  Based  Approach  to 
Characterize  Flotation  Froth  Bubbles 

Giuseppe  Bonifazi,  Silvia  Serranti,  Fabio  Volpe,  and  Riccardo  Zuco 

Dipartimento  di  Ingegneria  Chimica,  dei  Materiali,  delle  Materie  Prime  e  Metallurgia 
Universita  di  Roma  “La  Sapienza”,  Via  Eudossiana  1 8,  00184  Rome,  Italy 


ABSTRACT 

Flotation  process  monitoring  and  control  are  complex  targets  due  to  the  highly  non-linear  behavior  of  the 
process  and  the  large  number  of  variables  involved.  Control  is  generally  achieved  by  adopting  a  human  based 
approach.  By  observing  the  surface  of  flotation  cells,  experienced  plant  operators  suggest,  on  the  basis  of  their 
experience,  control  actions  such  as  changing  the  cell  level  set  points  and/or  modifying  reagent  doses.  The  main 
goal  of  this  paper  is  to  demonstrate  that  with  a  combined  approach  based  on  evaluation  of  the  morphological 
and  morphometrical  features  of  froth  bubbles  and  their  color  characteristics,  it  is  possible  to  perform  a  froth 
structure  analysis.  Froth  structure  modeling  permits  us  to  derive  useful  information  about  the  flotation  process 
behaviour.  The  structure  analysis  carried  out  by  means  of  digital  imaging  procedures  based  on  color,  texture 
and  morphometry  enables  definition  of  froth  classes  that  define  mineral  concentration  using  estimation  models 
set  up  from  an  analysis  of  froth  images.  The  paper  shows  that  by  adopting  such  an  approach,  it  is  possible  to 
identify  different  froth  classes  and  utilize  the  results  inside  estimation  models. 


INTRODUCTION 

It  is  well  known  that  flotation  is  a  very  complex  process.  Its  characteristics  are  strongly  affected  by  many 
factors  related  both  to  the  ore  (grade,  particles  size  distribution,  particles  morphology,  degree  of  liberation  of 
the  size  classes  to  float,  particle  surface  properties,  etc.)  and  the  process  itself  (variation  in  operating 
conditions).  In  the  past,  many  attempts  with  different  levels  of  intervention  on  the  process  have  been  carried 
out  to  attempt  to  control  the  process  by  adopting  different  strategies  of  data  collection.  As  a  result  the 
developed  procedures  are  in  many  cases  insufficient  to  realize  optimal  control  of  the  process  with  the  resulting 
strategies  being  "too  heavy"  to  be  handled  by  plant  operating  personnel. 

On  the  other  hand  very  sophisticated  control  techniques  need  long  periods  of  tuning  and  can  be  set  up  only  by 
skilled  control  engineers  who  are  not  always  "available"  in  terms  of  time  and  costs  [1],  Flotation  presents 
intrinsic  characteristics  that  especially  in  the  past,  were  largely  utilized,  at  a  heuristic  level  by  plant  operators. 
According  to  the  behavior  of  the  process,  flotation  froths  present  different  pictorial  aspects.  Such  aspects  are 
strongly  correlated  with  the  grade  and  recovery  of  valuable  minerals  in  the  concentrate.  Woodbum  et  al.  [2] 
was  the  first  to  remark  that  an  optimal  froth  structure  can  be  recognized  visually,  and  that  the  image  can  be 
quantitatively  characterized  by  image  analysis  techniques.  After  these  studies  several  works  have  been 
published  dealing  with  image  processing/analysis  procedures  applied  to  flotation  froth  characterization.  They 
describe  several  techniques,  different  approaches  and  application,  but  they  were  mainly  oriented  to  analyze  or 
to  solve  a  specific  target  strictly  linked  with  a  specific  process.  For  these  reasons  as  well-evidenced  by 
Moolman  et  al.  [3],  such  work  did  not  analyze  in  depth  the  problems  arising  when  a  full  digital  imaging  based 
procedure  is  applied  to  complex  image  samples  such  as  flotation  froths.  Even  though  they  give  a  great 
contribution  to  the  development  and  application  of  such  an  approach,  the  results  obtained  demonstrate,  one 
time  more,  the  complexity  of  the  problem  and  that  further  research  is  needed  to  realize  a  vision  system  to 
perform  the  required  detection  and  processing.  The  main  issues  still  to  be  addressed  relate  to  both  the  image 
analysis  and  flotation  sides  of  the  problem.  In  this  paper,  we  analyze,  discuss  and  critically  evaluate  problems 
encountered  when  an  approach  is  to  be  developed  to  analyze  flotation  froth  characteristics  at  the  plant  level  in 
terms  of  bubble  shape  and  color  adopting  a  fully  automatic  approach  based  on  digital  imaging  oriented  to 
numerically  define  and  identify  froth  classes  and  consequently  to  formalize  numerical  models  estimating  the 
concentration  of  valuable  minerals  inside  the  froth. 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


466 


Fig.  1.  Pictorial  examples  of  flotation  froth  classes  as  they  result  from  “human  based”  classification. 

1 :  class  consisting  of  quite  small  bubbles,  with  an  elliptical  shape.  2:  class  consisting  of  medium  sized  bubbles, 
with  regular  shape.  3:  class  consisting  of  regular  large  bubbles.  4:  class  consisting  of  circular  bubbles,  with 
both  large  and  small  bubbles.  5:  class  consisting  of  tiny  bubbles.  The  froth  is  more  like  a  mud. 


DIGITAL  IMAGE  SAMPLES  AND  FROTH  CLASS  DEFINITION 

About  1500  digital  images  of  flotation  froths  have  been  acquired  during  56  experiments  made  in  1997  in 
Boliden  plant  in  Garpenberg  (Sweden).  The  camera  was  placed  on  the  copper-lead  circuit  on  the  first 
scavenger  cell.  Each  experiment  was  up  to  5  minutes  long,  and  produced  up  to  15  samples.  Every  sample 
consisted  of  two  pictures  in  a  pair  taken  with  a  delay  of  0.2  seconds.  A  delay  time  of  20  seconds  was  adopted 
between  each  image  pair  acquisition.  At  the  same  time  the  pictures  were  taken,  the  corresponding  froths  were 
sampled  and  the  %Cu,  %Zn,  %Pb  and  %MgO  concentrations  were  analyzed.  The  goal  of  this  first  phase  was 
to  define  froth  classes.  Since  the  total  number  of  images  was  very  high,  the  dataset  used  for  froth  class 
definition  was  a  subset  of  the  original.  It  must  be  understood  that  images  belonging  to  the  same  experiment 
have  similar  macro-features;  so,  it  was  possible  to  take  only  one  image  from  each  experiment  in  order  to  assign 
the  images  of  a  specific  experiment  to  one  class.  The  prepared  dataset  consisted  of  one  image  from  each 
experiment.  Each  image,  constituting  the  reference  data,  was  taken  at  the  same  time  (60  seconds)  from  the 
beginning  of  each  experiment.  All  acquisitions  were  made  in  full  color,  so  RGB  (red,  green  and  blue)  images 
with  a  24  bit-depth  per  pixel  was  processed.  The  resolution  of  each  image  was  288x384  pixels. 

Once  the  database  was  created,  three  people,  not  expert  in  image  processing,  but  with  good  scientific  and 
cultural  level,  were  selected.  They  were  supplied  with  the  images  of  the  database,  and  they  were  asked  to 
define  some  classes  and  to  assign  each  image  of  the  database  to  one  of  the  classes.  The  results  were 
remarkably  homogeneous.  Four  classes  were  defined  according  to  shape  and  size  of  the  flotation  froth  bubbles. 
The  identified  classes,  defined  by  non  experts,  weree  then  analyzed  by  experts.  As  a  result  of  this  additional 
analysis,  a  new  class,  based  on  color  characteristics,  was  added  to  the  original  four  classes.  So  a  total  of  five 
froth  classes  were  identified  (Figure  1): 

•  Class  1 :  consisting  of  quite  small  bubbles,  with  an  elliptical  shape. 

•  Class  2:  consisting  of  medium  sized  bubbles,  with  regular  shape. 

•  Class  3:  consisting  of  regular  large  bubbles. 

•  Class  4:  consisting  of  circular  bubbles,  with  both  large  and  small  bubbles. 

•  Class  5:  consisting  of  very  small  bubbles,  and  the  froth  is  more  like  a  mud. 

Assignment  of  each  image  to  one  of  the  classes  gaves  varying  results  according  to  the  "non  experts",  but  the 
discrepancies  were  not  great.  An  expert  supervised  the  human-based  classification/assignment  to  solve 
ambiguous  cases.  Each  image  coming  from  Garpenberg  plant  was  thus  assigned  to  a  specific  class. 


FROTH  BUBBLE  CHARACTERIZATION 

In  order  to  generate  a  morphological  and  color  characterization  of  the  froth,  the  first  step  was  to  select  a 
method  for  bubble  identification  and  a  method  to  analyse  froth  color.  In  order  to  segment  the  froth  image  to 
identify  and  characterize  each  bubble,  morphologically  and  morphometrically,  the  watershed  technique  was 
applied  [4].  Several  parameters  were  identified,  computed  and  averaged  for  each  processed  image  for 
morphological  characterization  of  the  froths.  Froth  color  characterization  is  carried  out  by  adopting  two 
different  approaches:  i)  color  features  analysis  applied  to  the  whole  image  sets  in  different  color  spaces  and  ii) 
color  feature  analysis  of  single  bubbles  in  different  color  systems. 


467 


2a  2b 

Fig.  2.  Source  image  (2a)  and  corresponding  segmented  image  (2b)  after 

applying  different  enhancement  techniques  followed  by  the  watershed  filter. 

Froth  Image  Segmentation 

Image  segmentation  partitions  the  spatial  domain  into  mutually  exclusive  subsets,  called  regions,  each  one 
presenting  uniform,  homogeneous  characteristics  with  respect  to  a  property  such  as  tone  or  texture  but  with 
one  property  differing  in  some  way  from  that  of  its  neighbors.  Image  segmentation  is  a  crucial  step,  especially 
when  specific  domains  of  the  image  (bubbles)  must  be  numerically  characterized  and  classified  for 
recognition.  The  quality  and  interpretation  of  measurements  of  different  parts  of  an  image  depend  critically  on 
the  "quality"  of  segmentation  which  assigns  pixels  to  a  particular  class.  Different  methods  are  available: 

Region-based  Approach  (RBA):  assign  pixels  to  regions/objects  (thresholding  or  watershed  segmentation). 
Boundary-based  Approach  (BBA):  locate  regional  boundaries  (boundary-tracking,  Laplacian  filtering). 
Edge-based  Approach  (EBA):  identify  edge  pixels  and  link  them  to  form  boundaries. 

The  Watershed  Algorithm 

In  this  work  a  RBA  (watershed)  was  used.  The  intensity  of  the  froth  image  was  used  as  a  DTM  (Digital 
Terrain  Model)  with  the  elevation  related  to  the  tone  of  the  pixels  in  the  image.  The  watershed  segmentation 
algorithm  is  based  on  flooding  of  a  landscape  or  topographic  relief  with  water,  on  the  piercing  of  the  holes  at 
the  position  of  local  minima  and  on  the  immersion  of  the  landscape  into  a  lake.  Basins  will  fill  with  water 
starting  at  local  minima,  and  at  points  where  water  coming  from  different  basins  would  meet,  dams  are  built. 
When  the  water  level  has  reached  the  highest  peak  in  the  landscape,  the  process  is  stopped.  The  set  of  dams 
obtained  partitions  the  region  into  catchment  basins.  These  dams,  projected  on  the  horizontal  plane,  are  the 
watershed  lines.  A  recursive  algorithm  exists  to  compute  the  watershed  transform  [5].  The  basic  structure  of 
the  algorithm  is  a  loop  in  which  the  image  is  thresholded  at  successive  gray  levels.  At  each  iteration,  the  basins 
belonging  to  the  minima  are  extended  by  their  influence  zones  within  the  binary  image  obtained  by 
thresholding  at  the  current  gray  level.  The  watershed  transform  can  suffer  from  severe  over-segmentation.  This 
requires  preprocessing  the  images  by  applying  edge  detectors  and  smoothing  filters  (Figure  2). 

The  procedure  adopted  is  as  follows.  The  RGB  (Red,  Green  and  Blue)  source  images  of  froth  are  first  cut  with 
a  window  of  256x256  pixels  to  make  them  comparable  and  to  eliminate  black  borders,  then  the  intensity 
channel  is  extracted.  The  best  image  enhancement  is  accomplished  with  application  of  a  sharpening  filter. 
Evaluation  of  the  maximum  frequency  from  the  gray  level  histogram  of  each  image  is  then  performed.  This 
value,  recalculated  as  a  percentage,  is  assumed  as  the  threshold  value  for  the  watershed  filter.  With  application 
of  the  watershed  filter  to  the  images,  bubble  segmentation  is  obtained  [6], 

Froth  Image  Morphological  Parameters 

For  each  segmented  image,  a  list  of  geometrical/morphological  parameters  (i.e.  area,  aspect,  area/box,  box/xy, 
major  and  minor  axis,  min/max  diameters,  min/max  radius,  perimeter,  roundness,  length,  width,  etc.), which 
describe  a  single  bubble,  is  computed.  In  order  to  obtain  a  single  value  for  the  whole  image  morphology,  the 
value  of  each  parameter  is  averaged  over  all  bubbles  with  an  area  exceeding  15  pixels.  For  each  image,  42 
output  parameters  (means  and  standard  deviations)  are  identified  and  computed. 

Froth  Image  Color  Algorithm 

The  color  algorithms  used  for  the  test  are  based  on  the  color  of  the  froth.  They  simply  computes  average 
values  and  standard  deviations  at  image  level,  using  the  whole  image  or  the  segmented  images  (that  is, 
considering  only  bubbles  with  an  area  exceeding  15  pixels)  in  the  3  color  spaces:  RGB  (red,  green,  blue). 


468 


HSV  (hue,  saturation,  value)  and  IHS  (intensity,  hue,  saturation).  No  significant  differences  were  detected 
following  single  domain  (bubble)  or  the  full  image  color  approach. 


Estimation  of  MgO  -  Garpenberg  data 

Ro2  =  0.854  -  Average  error  =  1.00  (6.77%  of  measured  value) 
Parameters  used:  AvgVal  AvgHue  AreaDev 

!*“%MgO  (R) 

1 

- estimation  MgO  (R) 


Image  number 


i 


Fig.  3.  Relationship  between  computed  %MgO  and  the  same  value  from  lab  tests  (talc  concentration). 
The  parameters  used  for  the  estimate  are  AverageValue,  AverageHue  and  AreaDeviation. 


Estimation  of  Zn  -  Garpenberg  data 

Ro2  =  0.863  -  Average  error  =  0.84  (9.57%  of  measured  value) 
Parameters  used:  AvgHue  AspectAvg  AvgSat(IHS) 


%Zn  (G9) 

1 - Estimation  Zn%  (G9) 


Fig.  4.  Relationship  between  computed  %Zn  and  the  same  value  obtained  from  laboratory  tests. 
The  parameters  used  for  the  estimate  are  AverageValue,  AverageAspect  and  AverageSaturation. 


ESTIMATION  OF  MINERAL  CONCENTRATION 

Statistical  Models 

Estimation  has  been  carried  out  for  copper  (Cu),  zinc  (Zn)  and  talc  concentration  (MgO).  It  has  been 
supplied  with  a  satisfactory  average  error.  The  quality  of  the  estimation  has  been  quantified  by  means  of 
regression  analysis  ( the  R,,2  of  the  distribution,  the  average  and  maximum  error). 


469 


Estimation  of  MgO  -  Garpenberg  data 
Ro2  =  0.907  -  Average  error  =  0.77  (5.43%  of  meas.  value) 

Parameters  used:  AvgVal  AvgHue  AreaDev 


Image  number 


Fig.  5.  Relationship  between  computed  %MgO  and  the  same  value  from  laboratory  tests. 

The  analyses  have  been  carried  out  considering  the  different  identified  froth  classes. 
The  parameters  used  for  the  estimate  are  AverageValue,  AverageHue  and  AreaDeviation. 


Estimation  of  Zn  -  Garpenberg  data 
Ro2  =  0.907  -  Average  error  =  0.66  (6.65%  of  measured  value) 

Parameters  used:  AvgHue  AspectAvg  AvgSat(IHS) 


%Zn  (G9) 

Estimation  Zn%  (G9) 


Image  number 


Fig.  6.  Relationship  between  computed  %Zn  and  the  same  value  from  laboratory  tests. 

The  analyses  were  carried  out  considering  the  different  identified  froth  classes. 

The  parameters  used  for  the  estimate  are  AverageValue,  AverageAspect  and  AverageSaturation. 

The  analyses  applied  to  the  whole  set  of  digitally  collected  images  produce  as  a  result:  i)  an  average  error  on 
MgO  concentration  of  about  1.00  %,  i.e,  6.8%  of  the  real  value,  with  an  Ro2  of  0.854;  ii)  an  average  error  on 
the  computed  Zn  value  of  about  0.84  %,  i.e,  9.6%  of  the  real  value,  with  an  Ro2  of  0.863  and  an  average  error 
on  the  computed  Cu  value  of  about  0.23  %,  i.e.,  33.9%  of  the  real  value,  with  an  R<,2  of  0.823.  The 
expressions  for  the  estimated  mineral  concentration  were  derived  from  a  subset  of  the  original  images,  and  the 
resulting  correlation  was  applied  to  the  remaining  images.  The  approach  is  based  on  the  definition  of  a  single 
expression  that  can  fit  the  mineral  concentration  on  all  samples  (images).  From  an  analysis  of  Figures  3  and  4 
it  is  evident  there  are  some  groups  of  images  (each  one  belonging  to  the  same  experiment,  presenting  the  same 


470 


value  of  concentration  in  the  investigated  element)  for  which  the  estimation  expression  gives  the  worst  results. 
For  example,  in  the  case  of  talc,  the  average  error  of  1 .00  %,  it  is  possible  to  see  from  Figure  3  the  error  on  the 
first  experiment  (image  number  ranging  from  1  to  30)  is  quite  high.  The  same  situation  is  present  in  at  least 
two  or  three  other  experiments  (image  number  ranging  from  241  to  298).  For  zinc,  especially  in  the  second 
experiment  (image  number  ranging  from  31  to  60),  there  are  high  discrepancies  between  computed  and 
measured  values.  If  these  discrepancies  relate  to  macroscopic  froth  differences,  as  it  does,  the  idea  of  mineral 
computation  expressions  depending  upon  macro-features  may  be  appropriate. 

Critical  Froth  Classes  and  Statistical  Models 

In  order  to  improve  the  estimate  of  mineral  concentration,  froth  classes  are  accounted  for  in  the  analyses.  All 
images  are  assigned  to  one  of  the  five  classes  previously  identified.  For  each  class,  the  correlation  is  checked 
and,  if  the  accuracy  is  under  the  global  level,  it  is  re-computed.  The  result  is  a  marked  improvement  in  the 
estimation  correctness  (Figures  5  and  6)  as  compared  with  the  results  in  Figures  3  and  4  respectively.  The 
estimation  of  "difficult"  experiments  has  improved,  and  the  R02  of  the  regression  has  increased  to  0.907  both 
for  talc  (from  0.854)  and  zinc  (from  0.863).  The  average  error  decreased  from  1.00  to  0.77  %  for  MgO,  and 
from  0.84  to  0.66  %  for  Zn.  The  maximum  error  also  decreased  from  3.77  to  3.35  %  for  MgO  and  from  3.65 
to  2.72  %  for  Zn.  In  the  case  of  Cu,  the  R02  increased  from  0.823  to  0.894  and  the  average  error  decreased 
from  0.23  to  0.20  %.  Introduction  of  critical  froth  classes  strongly  improved  the  quality  of  the  estimates.  The 
results  obtained  with  "froth  classes"  must  be  considered  satisfactory;  the  Ro2  values  are  all  close  to  0.9  and 
estimation  of  the  concentration  value  is  not  far  from  reality. 

CONCLUSION 

The  adoption  of  a  combined  morphological  and  color  based  approach  to  characterize  flotation  froth  bubbles 
has  been  demonstrated,  with  reference  to  the  analyzed  digital  image  sample  set,  to  be  quite  promising;  giving 
in  some  cases  excellent  results  and  permitting  to  define  procedures  suitable  to  be  implemented  inside  specific 
control  engines.  The  extensive  studies  enhanced  as  the  procedures,  originally  designed  and  applied,  results 
quite  sensitive  to  i)  bubble  segmentation  algorithms  and  ii)  the  definition  of  critical  froth  classes  to  introduce 
inside  the  estimation  procedures.  With  reference  to  this  last  point  it  is  important  the  definition  of  suitable 
classification  procedures  in  order  to  operate  an  “a-priori”  assignment  of  froths  to  critical  classes.  The 
possibility  to  operate  such  "a-priori"  classification  will  permit  design  of  "on-line"  software  architectures  to 
compute  to  which  class  a  froth  belongs  and  then  estimate  the  unknown  mineral  concentration  by  means  of 
algorithms  linked  to  each  class.  In  order  to  reach  this  goal,  recognition  algorithms  for  classification  of  the 
processed  froths  are  under  study.  The  results  will  be  published  soon. 

ACKNOWLEDGEMENT 

This  work  was  financially  supported  within  the  framework  of  ChaCo  (Characterization  of  Flotation  Froth 
Structure  and  Color  by  Machine  Vision)  Esprit  long  term  RP-N.24931  of  the  EEC.  Froth  image  acquisition 
and  chemical  data  collection  and  analyses  were  carried  out  by  Boliden  AB  at  Garpenberg  processing  plant. 

REFERENCES 

1.  D.J.  McKee,  1991.  Automatic  flotation  control  -  a  review  of  20  years  of  effort.  Miner.  Eng.,  4,  653-66. 

2.  E.T.  Woodbum,  J.B. Stockton  and  D.J.  Robbins,  1989.  Vision-based  characterization  of  three-phase 
froths.  International  Colloquium.  Developments  in  Froth  Flotation,  South  African  Inst,  of  Min.  and  Met., 
Gordon's  Bay,  RSA,  1,  1-20. 

3.  D.W.  Moolman,  J.J.  Eksteen,  C.  Aldrich  and  J.S.J.  Van  Deventer,  1996.  The  significance  of  flotation 
froth  appearance  for  machine  vision  control.  Int.  J.  Miner.  Process.,  48,  135-158. 

4.  S.  Beucher,  F.  Meyer,  1993.  Morphological  approach  to  segmentation:  the  watershed  transformation.  In: 
Math.  Morphology  and  Image  Proc.  (Ed.,  Dougherty  E.R.),  M.  Dekker,  NY,  433-81 . 

5.  L.  Vincent  and  P.  Soille,  1991.  Watersheds  In  Digital  Spaces:  An  Efficient  Algorithm  Based  On 
Immersion  Simulations.  IEEE  Trans.  Pattern  Anal.  Mach.  Intell.,  13,  583-598. 

6.  G.  Bonifazi,  S.Serranti,  F.  Volpe,  R.  Zuco,  1998.  Flotation  froth  characterization  by  optical-digital 
sectioning.  4th  Inter.  Conf.  on  Quality  Control  by  Artificial  Vision.  Takamatsu,  Japan,  131-137. 


471 


A  Robust  Bubble  Delineation  Algorithm  for  Froth  Images 

Weixing  Wang  and  Ove  Stephansson 

Engineering  Geology,  Department  of  Civil  &  Environmental  Engineering 
Royal  Institute  of  Technology,  SE- 10044  Stockholm,  Sweden 
E  mail:  weixing.wang@imenco.se.  ove@aom.kth.se 


ABSTRACT 

This  paper  describes  a  robust  segmentation  algorithm  for  froth  images  from  flotation  cells  in  mineral 
processing.  The  size,  shape,  texture  and  color  of  bubbles  in  a  froth  image  is  very  important  information  for 
optimizing  flotation.  To  determine  these  parameters,  the  bubbles  in  a  froth  image  have  to  be  delineated 
first.  Due  to  the  special  characteristics  of  froth  images  and  a  large  variation  of  froth  image  patterns  and 
quality,  it  is  difficult  to  use  classical  segmentation  algorithms.  Therefore,  a  new  segmentation  algorithm 
was  developed  to  delineate  every  individual  bubble  in  a  froth  image. 

A  new  segmentation  algorithm  based  on  valley-edge  detection  and  edge  tracing  has  been  developed.  In 
order  to  detect  bubble  edges  clearly  and  disregard  the  edges  of  the  white  spots,  the  algorithm  just  detects 
valley-edges  between  bubbles  in  the  first  step.  It  detects  each  image  pixel  to  find  if  it  is  the  lowest  valley 
point  in  a  certain  direction.  If  it  is,  the  pixel  is  marked  as  an  edge  candidate.  Before  this  procedure,  to 
alleviate  noise  edges,  an  image  enhancement  procedure  was  added  to  filter  out  the  noise  pixels. 

After  valley-edge  detection,  the  majority  of  edges  are  marked  at  one  time,  but  some  small  gap  between 
edges,  and  noise  still  exist  in  the  image.  To  reduce  the  noise,  a  clean  up  procedure  was  developed.  To  fill 
the  gaps,  an  edge  tracing  algorithm  was  applied,  in  which,  edges  are  smoothed  into  one  pixel  width. 
Endpoints  and  their  directions  are  detected,  and  edge  tracing  starts  from  the  detected  endpoints.  When  a 
new  valley-edge  pixel  is  found,  the  algorithm  uses  it  as  a  new  endpoint,  and  the  valley-edge  tracing 
procedure  continues  until  a  contour  of  a  bubble  is  closed. 

The  segmentation  algorithm  has  been  tested  on  images  from  Pyhasalmi  mine  in  Finland  and  Garpenberg 
mine  in  Sweden.  The  processing  speed  of  the  algorithm  is  much  faster  than  for  normal  morphological 
segmentation  algorithms.  The  processing  accuracy  is  better  than  that  of  manual  segmentation  result. 

Keywords:  image  enhancement,  valley-edge  detection,  segmentation,  froth  image,  bubble  delineation. 


INTRODUCTION 

The  project  ChaCo  (Characterization  of  Flotation  Froth  Structure  and  Color  by  Machine  Vision)  is 
conducted  within  Esprit-RTD  in  Information  Technologies  of  the  4th  EU  -  program.  The  aim  of  the  project 
is  to  develop  an  on-line  optical  system  for  monitoring  mineral  froth  and  to  optimize  flotation  in  mineral 
processing.  The  system  consists  of  three  parts:  (1)  illumination  system  for  grabbing  high  quality  froth 
images;  (2)  image  processing  and  analysis  which  will  provide  the  visual  information  for  froth  modeling  ; 
and  (3)  froth  modeling  for  controlling  mineral  processing.  For  image  processing  and  analysis,  image 
segmentation  is  very  important  for  classification  and  for  analyzing  bubble  mobility,  stability,  color,  texture, 
size  and  shape. 

In  order  to  recognize  the  characteristics  of  froth  images,  hundreds  of  images  were  investigated.  The  results 
show:  (1)  bubbles  touch  each  other  and  there  is  no  void  space  (background)  between  bubbles;  (2)  the 
illumination  on  the  bubble  surface  is  uneven;  (3)  the  edges  between  bubbles  are  weak,  and  (4)  strong  edges 
exist  at  the  border  of  specularity  regions  (white  spots).  All  the  characteristics  mentioned  make 
segmentation  algorithm  development  of  froth  images  complicated.  During  the  investigation,  several 
commercial  software  for  image  segmentation  were  used.  Previous  aggregate  segmentation  algorithms  [1-3] 
were  tested  too.  The  segmentation  testing  results  indicated  that: 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


472 


•  It  is  impossible  to  use  intensity  similarity  to  segment  bubbles  in  a  froth  image,  because  the  gray  value 
variation  between  bubbles  is  too  small,  and  the  variation  in  the  interior  of  a  bubble  is  large.  The 
segmentation  results  gave  both  over-segmentation  and  under-segmentation  problem. 

•  The  classical  edge  detection  functions  fail  to  segment  froth  images  [4],  the  reason  is  that  boundaries 
between  bubbles  have  a  low  gradient  magnitude  value,  and  the  boundaries  on  the  specularity  parts 
(white  or  light  spots)  have  a  high  gradient  magnitude  value. 

•  The  morphological  segmentation  algorithms  (e.g.  Watershed)  can  only  be  used  for  froth  images  where 
the  maxima  in  a  bubble  is  easy  to  find.  For  most  cases,  this  is  extremely  difficult  [4-7]. 

For  forth  image  investigation  and  segmentation  testing,  we  developed  a  new  segmentation  algorithm  based 
on  valley-edge  detection  and  edge  tracing.  The  new  algorithm  contains:  image  enhancement  and  valley- 
edge  detection,  and  edge  smoothing  and  tracing. 


IMAGE  ENHANCEMENT  AND  VALLEY-EDGE  DETECTION 

In  a  froth  image,  the  characteristics  of  surface  bubbles  are  shown  in  Fig.  1 .  Areas  with  high  gray  values  are 
white  sports  (e.g.  areas  between  lines  2  and  3,  5  and  6,  8  and  9),  the  boundaries  between  bubbles  are  marked 
at  lines  1 ,  4  and  7,  where  edge  strength  is  weak.  The  classical  edge  detectors  cannot  be  used  for  this  kind  of 
images.  The  main  reasons  for  this  is:  (1)  white  (or  light)  spots  which  are  on  the  interion  of  bubbles  affect 
edge  detection  results  strongly;  and  (2)  the  gray  value  changes  at  the  edges  between  bubbles  are  not 
significant  when  applying  the  classic  edge  detection  algorithms. 


Fig.  1.  Gray  value  versus  pixels  of  a  schematic  cross-section  of  a  froth  image. 

Fig.  2a  is  an  example  of  an  original  froth  image,  and  Fig.  2b  is  the  gradient  magnitude  image  where  high 
strength  edges  are  located  on  the  boundaries  of  white  spots  inside  bubbles.  The  edges  between  bubbles  are 
very  weak. 

In  order  to  detect  bubble  edges  clearly  and  disregard  the  edges  of  the  white  spots,  a  new  valley-edge 
detection  algorithm  is  developed  as  follows: 

The  new  valley-edge  detection  algorithm  detects  each  pixel  to  see  if  it  is  the  lowest  valley  point  in  a  certain 
direction.  If  it  is,  the  pixel  is  used  as  the  valley-edge  candidate,  and  its  direction  and  location  are  marked.  In 
Fig.  3,  when  detecting  pixel  p,  we  check  different  directions  to  find  out  if p  is  the  valley-edge  point.  If  it  is, 
it  is  marked  as  a  valley-edge  candidate.  The  diameter  R  is  pre-determined  on  image  resolution  and  quality. 

To  understand  the  basic  idea,  we  assume  that  a  froth  image  is/  at  p  point,  its  gray  value  is  f(i,j),  and  in  the 
a  direction,  the  gray  values  of  five  lines  on  the  left  side  is  11, 12, 13, 14  and  15,  on  the  right  side  rl,  r2,  r3,  r4 
and  r5,  and  10  and  rO  is  f(ij). 


473 


We  calculate  parameter  Va  according  to  Eqs.  (1)  and  (2).  If  Vo.  is  greater  than  a  pre-determined  threshold 
T,  the  detected  point p  can  be  accepted  as  a  valley-edge  point. 


The  gray  value  for  each  of  the  lines  in  a  triangle  (see  Fig.  3b)  can  be  a  weighted  average  gray  value  or  a 
median  gray  value,  which  is  determined  based  on  the  number  of  pixels  in  each  line  and  image  quality.  In 
real  applications,  the  lines  can  be  curves.  The  number  of  detecting  directions  can  be  decided  on  the  image 
resolution  and  bubble  size  (pixel  unit).  One  question  to  be  resolve  in  this  algorithm  is  how  to  select  a 
threshold  T ,  a  simple  threshold  may  fail  to  detect  some  valley-edges.  To  increase  the  detection  accuracy, 
one  option  is  to  set  up  two  thresholds  77  and  T2  ( T2  >  Tl).  When  Va  >  T2,  the  detecting  point  is  accepted 
as  a  valley-edge  point,  but  when  Tl  <  Va  <  T2,  the  detecting  point  might  be  a  valley-edge  point,  it  is 
uncertain.  For  this  uncertainty  or  vague  situation,  fuzzy  mathematics  are  applied.  Detailed  discussion  and 
practice  of  this  method  will  be  discussed  in  a  following  paper. 


*  ■  #" 

...  #  *  . 


#  ,  •#  " 

#  ~M. 

W  " 

*  *  & 

»  * 

« 

•»  «'# 


<r 

m  gf' 

4  .*•  * 

# 

* 

KL  s*  ©■ 

*  ^  A 

A***.#  » 

*  # 

m  *' 

W  r 

* 

•  <■>  5  *.  - 

0  9 

m 

i-iif 

•*  .  # 

-¥ 

,  ■# 

■m 

4  & 

,i>-  ■;*:  %  - 

P  » 

«  »  ' 

-*  s  r 

<4 

4  '  ♦  *: 

f 

t  . 

w 

, 

p .?  4  *  0 

4 

# 

0  * 

'  „  *  ’•  ■  ■  • 

m  m 

& 

'  *  •  '•  1 

4 

*  * 

* !  ?  * 

58»  i 

*  A  #  8  *  •  ■  • 

*  ..t 

l 

m 

* 

M  # 

- 

'  6  * 

■  *  * 

«  *  4 

*■  4 

« 

0  & 

.  *■  r 

,  * 

«  *  „ 

i«  * 

. 

*  ■#  ^  ^  ^ 

a  r 

* 

*  , 

4»  ^  ^ 

« 

t  ? 

*  » 

r  •  •* 4 

* 

0  s.  •••  «  9  ® 

*•  ; 

0  A. 

#  o  a  *  *  4  * 


#  * 

* 


»  * 


Fig.  2.  Edge  detection  result  of  a  froth  image,  a:  Original  froth  image,  b:  Gradient  magnitude  image. 


a  b  c 

Fig.  3.  The  diagram  for  valley-edge  detection  algorithm,  a:  assuming  a  circle  (diameter  R)  surrounding  the 
detected  point  p,  with  four  basic  detecting  directions  (0,  45,  90  and  135  °),  b:  p  is  a  valley  point 
between  2  bubbles  in  the  a-direction,  its  detecting  area  is  2  triangles,  in  each  of  the  triangles,  there 
are  5  lines  of  pixels,  so,  the  diameter  R  of  the  circle  in  Fig.  3a  is  1 1  pixel  units;  c:  curves  show  that 
detecting  point  p  is  a  significant  valley-edge  point  in  the  a-direction,  but  not  in  (3-direction  . 


474 


va i=^Mk)-Ok- 4-1 );  Vw  =  X  At) ■  {rk  -  rk_x )  (1) 

k= 1  i=l 

where  w(ft)  is  a  weight  function  (here  £=1,2,  ...  5). 

K=Va!+Vw.>T,  Val  >  0  and  Var  >  0  (2) 


Figure  4a  shows  the  detection  result  from  the  image  in  Fig.  2a,  where  most  of  the  valley  points  are  marked, 
but  there  is  still  much  noise  in  the  image.  To  alleviate  the  noise,  two  image  enhancement  functions  are 
added;  one  is  used  prior  to  valley-edge  detection,  and  another  is  used  as  a  post-processing  function. 


Fig.  4.  Valley-edge  detection  results  of  a  froth  image. 

a:  Valley-edge  detection  directly  on  an  original  image  (see  Fig.2a), 

b:  Valley-edge  detection  on  smoothed  image,  in  which,  an  average  smoothing  filter  is  used. 

Normally  original  images  include  a  lot  of  noise  which  affect  valley-edge  detection  results.  One  simple  way 
to  reduce  noise  is  to  use  a  smoothing  filter  such  as  a  Guassian  smoothing  function.  After  valley-edge 
detection,  the  image  is  shown  in  Fig.  4b,  where  most  significant  edges  exist  in  the  image.  As  the  resulting 
image  is  not  completely  satisfactory,  a  post-processing  subroutine  must  be  added.  In  post-processing, 
several  functions  are  used,  namely:  thinning,  small  gaps  linking,  short  curve  or  line  removing  etc. 


EDGE  TRACING 

After  image  enhancement  and  valley-edge  detection,  the  valley-edges  are  smoothed  into  one  pixel  width, 
but  some  gaps  and  noise  still  exist  in  the  image.  To  close  the  contours  of  bubbles,  one  must  perform  edge 
tracing  or  contour  tracing.  To  do  this,  the  new  algorithm  detects  significant  endpoints  of  curves  (or  lines), 
then  estimates  the  directions  for  each  endpoint  based  on  local  valley-edge  pixel  directions.  Finally  the 
algorithm  traces  contours  according  to  the  information  of  directions  of  each  newly  detected  pixel  (new 
endpoint)  and  an  intensity  cost  function,  in  which,  valley-edge  tracing  starts  from  the  detected  endpoints  to 
see  which  neighborhood  has  the  lowest  gray  value.  When  a  new  pixel  is  found  as  a  valley-edge  point,  it  is 
used  as  a  new  endpoint.  The  tracing  procedure  continues  until  one  of  the  bubble  contours  is  closed,  before 
it  starts  to  trace  from  another  detected  endpoint.  When  no  detected  endpoints  remain  for  continuous  tracing, 
the  valley-edge  tracing  procedure  stops. 


475 


Two  image  results  with  the  new  algorithm  are  shown  in  Fig.  5.  In  Fig.  5a,  the  original  image  (Fig.2a)  size  is 
256x256  pixels,  and  the  number  of  bubbles  is  about  370.  The  segmentation  result  is  quite  good.  In  Fig.  5b, 
the  image  size  is  the  same  as  in  Fig.  5a,  but  the  number  of  bubbles  is  about  1525,  and  the  segmentation 
result  by  the  new  algorithm  is  reasonable. 

To  evaluate  the  new  segmentation  algorithm,  about  10  froth  images  were  segmented  manually  using 
interactive  image  processing  and  analysis  software.  The  image  size  is  384x288,  the  variation  in  the  number 
of  bubbles  in  an  image  varies  from  300  to  3000.  For  each  of  the  images,  the  number  of  bubbles  was 
counted  three  times  by  an  operator,  or  counted  by  three  different  operators  at  the  same  time.  On  average, 
one  counting  for  each  image  took  about  two  hours.  The  new  algorithm  only  took  about  2-3  seconds  in  a  PC 
compute  with  a  clock  speed  of  about  200  Mhz.  A  variation  of  20-35%  was  obtained  from  the  three  manual 
countings  of  each  image.  Especially,  for  the  images  with  more  than  1 500  bubbles,  the  variation  in  number 
of  bubbles  is  large  between  countings.  When  the  average  number  of  bubbles  counted  by  manual  in  an 
image  was  compared  to  the  number  of  bubbles  counted  by  the  new  algorithm,  the  difference  is  about  10%. 


Fig.  5.  Segmentation  results  on  two  different  types  of  images,  by  the  new  algorithm. 

a:  Segmentation  result  of  370  bubbles,  b:  Segmentation  result  of  1525  bubbles. 


CONCLUSIONS 

In  this  contribution,  a  new  segmentation  algorithm  -  based  on  valley-edge  detection  and  edge  tracing  has 
been  presented.  It  was  especially  designed  for  froth  images  from  flotation  cells  in  mineral  processing.  The 
algorithm  first  uses  a  Gaussian  filter  or  average  filter  to  smooth  the  original  image.  Then  it  detects  valley- 
edges  between  bubbles.  Thirdly,  it  performs  valley-edge  tracing  based  on  both  post-processed  valley-edge 
image  and  the  original  image.  The  segmentation  algorithm  has  been  tested  on  froth  images  from  Pyhasalmi 
mine  in  Finland  and  Garpenberg  mine  in  Sweden.  The  processing  speed  of  the  algorithm  is  much  faster 
than  normal  morphological  segmentation  algorithms.  Accuracy  is  better  than  that  of  manual  counting.  The 
test  results  show  that  the  algorithm  works  satisfactorily. 


ACKNOWLEDGMENT 

This  contribution  is  part  of  the  European  Commission  DG  III  -  Industry,  Espri,  Project,  No.  24931- 
CHACO.  Froth  images  were  provided  by  Garpenberg  mine  of  Boliden  Minerals,  Sweden  and  Pyhasalmi 
mine  of  Outukumpu  Oy,  Finland.  Their  contribution  is  acknowledged. 


476 


REFERENCES 

1.  W.X.  Wang,  1999.  Image  analysis  of  aggregates.  Computers  &  Geosciences  25,  71-81. 

2.  O.  Stephansson,  W.X.  Wang  and  S.  Dahlhielm,  1992.  Automatic  image  processing  of  aggregates. 
ISRM  Symposium:  EUROCK  '92,  Chester,  UK,  14-17  September,  British  Geotechnical  Society, 
London,  UK,  pp.  31-35. 

3.  W.X.  Wang,  F.  Bergholm  and  O.  Stephansson,  1996.  Program  Library  for  Image  Analysis  of 
Aggregates.  20th  Conference  for  Mineral  Technique,  Lulea,  Sweden,  13-14  February,,  pp.  159-168. 

4.  N.  Sadr-Kazemi  and  J.  J.  Cilliers,  1997.  An  image  processing  algorithm  for  measurement  of  flotation 
froth  bubble  size  and  shape  distributions.  Mineral  Engineering,  10(  10),  1075-1083. 

5.  P.  J.  Symonds  and  G.  De  Jager,  1992.  A  technique  for  automatically  segmenting  images  of  the  surface 
froth  structures  that  are  prevalent  in  flotation  cells.  Proceedings  of  the  1992  South  African  Symposium 
on  Communications  and  Signal  Processing.  University  of  Cape  Town,  Rondebosch,  South  Africa,  1 1 
September,  pp.  111-115. 

6.  M.  Guarini,  A.  Cipriano,  A.  Soto,  and  A.  Cuesalaga,  1995.  Using  image  processing  techniques  to 
evaluate  the  quality  of  mineral  flotation  process.  In  Proceedings  of  the  6h  International  Conference  on 
Signal  Processing,  Applications  and  Technology,  Boston,  October  24-25,  pp.  1227-1231. 

7.  A.  Cipriano,  M.  Guarini,  R.  Vidal  and  A.  Soto  etc.,  1998.  A  real  time  visual  sensor  for  supervision  of 
flotation  cells.  Mineral  Engineering,  1 1(6),  489-499. 


477 


The  Characterization  of  Flotation  by 
Colour  Information  and  Selecting  the  Proper  Equipment 

A.K.  Siren 

VTT  Information  Technology,  Printed  Communication, 
Tekniikantie  4B,  Espoo,  02044  VTT,  Otaniemi,  Finland 
Tel:  358-9-456-5898  Fax:358-9-455-2839  Email:  ari.siren@vtt.fi 


ABSTRACT 

Flotation  is  the  most  common  industrial  method  by  which  valuable  minerals  are  separated  from  waste  rock, 
after  crushing  and  grinding  the  ore.  For  process  control,  flotation  plants  and  devices  are  equipped  with 
conventional  and  specialized  sensors.  However  certain  variables  are  left  to  the  visual  observation  of  the 
operator,  such  as  the  colour  of  the  froth  and  the  size  of  the  bubbles  in  the  froth. 

The  ChaCo  -project  (European  Union-  project  24931)  has  been  started  1997  November.  In  this  project 
measuring  station  was  build  at  Pyhasalmi  flotation  plant.  System  includes  RGB-  camera  and  spectral  colour 
measurement  instrument  for  flotation  colour  inspection.  RGB  camera  or  visible  spectral  range  is  measured 
also  for  comparing  the  comments  of  operators  for  colour  of  the  froth  related  to  sphalerite  concentration  and 
process  balance. 

Different  dried  mineral  (sphalerite)  ratios  were  also  studied  with  iron  pyrite  to  find  out  the  minerals  typical 
spectral  features.  Sphalerite  spectral  reflectance  over  different  wavelengths  correlation  to  the  sphalerate 
concentrations  are  used  for  selecting  proper  camera  system  with  filters  or  for  comparing  the  results  with 
colour  information  from  RGB-camera. 

Different  machine  vision  candidate  techniques  are  discussed  for  this  application  and  the  pre-processed 
information  of  dried  mineral  colours  is  used  and  adapted  to  the  on-line  measuring  station. 

Moving  bubbles  of  the  froth  produce  total  reflections  disturbing  the  colour  information.  Polarization  filters 
are  used  and  results  are  reported  Also  reflectance  outside  visible  light  with  this  application  is  also  studied 
and  reported. 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


f 


478 


479 


Intelligence  in  Environmental  Applications 


480 


481 


Robust  Engineering  Approaches  to  Maximize  Results  in 
Business,  Cost,  Engineering,  Human,  Quality  and 
System  Technologies 

Roberto  C.  Villas  Boas 


CYTED  -  Science  and  Technology  for  Development  in  Iberoamerica, 
Mineral  Technology  Sub-Program,  Madrid,  Spain 
Email:  villasboas@cetem.gov.br 


ABSTRACT 

Robust  Engineering  is  a  new  branch  of  engineering  techniques  developed  by  Geinichi  Taguchi,  in  the 
early  fifties  in  Japan.  It  is  now  in  wide  use  throughout  the  western  world,  after  the  tremendous  success  of 
several  industrial  applications.  Theses  successes  can  be  grasped  with  a  quick  look  at  the  home  page  of  the 
American  Supplier  Association:  http:/? www.amsup.com/TAGUCHI. 


INTRODUCTION 

The  basic  ideas  of  Taguchi  are  well-described  in  the  Harvard  Business  Review,  Jan-Feb  1990  edition  and 
elsewhere  [1,2].  They  consist  of  taking  care  of  a  product  or  process  from  the  very  beginning,  i.e.,  from 
when  the  concepts  are  just  being  formulated  —  well-before  production  or  manufacture  starts.  The  main  idea 
is  to  avoid  selecting  a  manufacturing  route  or  process  development  simply  to  be  within  certain  established 
tolerances  in  which  you  must  force  yourself  to  be  as  narrow  as  possible  to  target  "the  value"  instead  of  "an 
acceptable  range  of  variation". 

A  very  interesting  feature  when  designing  for  competitive  advantage  is  that  "value"  is  defined  as  a 
measure  of  choice;  see  Dean,  E.B.,  at  http://miiuno.larc.nasa.gov/dfc/value.html.  Moreover,  robust 
products  or  projects  have  "strong  signals",  despite  extensive  external  "noise"  to  which  they  are  subjected. 
These  thoughts  are  well-illustrated  in  every-day  practice,  when  a  given  product  or  process,  consisting  of 
several  parts  (  or  unit  operations  for  processes),  is  to  be  manufactured  or  run.  Every  part  or  unit  operation 
has  its  own  "tolerances"  and  the  couplings  of  these  may  result,  in  the  summation  of  the  variances  of  all  the 
parts  or  unit  operations  that  results  in  a  product  or  process  of  bad  performance  or  failure  . 

The  basis  of  Robust  Engineering  is  D.O  E.  (Design  of  Experiments)  -  a  set  of  mathematical  tools  which 
are  well-established  in  the  literature  ans  which  can  be  thought  of  as  being  devised  for  competitive 
advantage.  The  originality  of  the  Taguchi  approach  was  to  utilize  fractional  factorial  design  arrays  that 
could  provide  the  performance  or  endurance  of  a  product  or  process,  when  subjected  to  both  controlled  and 
uncontrolled  variables  that  affect  such  performance  and  endurance!  The  utilization  of  fractional  factorial 
designs  for  controlled  variables  is  not  a  Taguchi  achievement.  Rather  his  main  contribution  to  engineering 
was  the  proposal  of  a  specific  design  plan  subject  to  a  set  of  uncontrolled  variables  that  affect  the  process  or 
product  performance.  Such  uncontrolled  variables  might  be  humidity,  temperature,  cultural  habits  in 
utilizing  a  given  process  or  product,  lack  of  particular  expensive  equipment  for  increased  performance  of 
the  results,  etc.  The  statistical  analysis  he  proposed  which  is  now  being  used  by  engineering  users  around 
the  world,  is  very  simple,  but  has  been  heavily  criticized  by  many  statisticians  [3,4,5].  Despite  the  apparent 
short-comings  on  strictly  statistical  grounds,  the  method  of  analysis  works  in  the  real  world  and  all  the 
supposedly  "better"  statistical  approaches  that  have  been  proposed  generally  achieve  the  same  conclusions! 

A  discussion  on  some  topics  of  interest  in  Robust  Engineering  is  fundamental  for  the  reader  to  understand 
the  overall  issue.  DOE  Techniques  -  Design  of  Experiments  (DOE)  involve  the  application  of  certain 
geometric  principles  to  the  design  space  in  such  a  way  that  an  algorithm  that  indicates  the  sampling  route  to 
be  followed  is  achieved.  In  the  presentation,  factorial  experiments  will  be  fully  described  and  analyzed 
through  real  case-studies;  some  important  fractional  factorial  designs,  such  as  TAGUCHI's  methods,  will 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


482 


be  demonstrated  in  some  real  case  applications.  Several  Response  Surface  designs  will  be  discussed  and 
analyzed.  Taguchife  Orthogonal  Arrays  -  The  utilization  of  orthogonal  arrays  has  long  been  known  in 
DOE  and  several  experiments  are  based  on  these  elements.  The- innovation  that  Taguchi  brought  to  the 
discussion  was  the  establishment  of  arrays  to  study  controllable,  as  well  as,  uncontrollable  variables  that 
affect  experiments  and  consequently,  the  processes  and  products. 


ROBUST  DESIGNS 

Robust  designs  are  conducted  for  optimization  .  A  mathematical  transform  is  utilized  to  achieve  this;  the 
transform  being  the  signal  to  noise  ratio,  quite  arbitrarily  selected  by  Taguchi  as  the  transform  to  be 
always  used.  Such  a  ratio  gives  an  indication  of  how  close  the  design  is  to  the  optimum  performance  of  a 
given  process  or  product.  For  a  discussion  about  the  measurement  of  robustness,  see 
http://www.amsuD.com/TAGUCHl/ROBUST/ 

The  most  interesting  proposal  is  that  research  engineers  may  choose  a  variety  of  designs  that  best  suit  the 
engineering  goals.  These  designs,  called  Taguchi  arrays,  are  fractional  factorials  and  can  be  derived  by 
applying  the  principles  of  constructing  such  factorials  [6,7],  The  literature  contains  several  of  these  arrays 
[1,2,8]  represented  by  the  letter  L,  sometimes  A,  followed  by  a  subscript  indicating  the  number  of  actual 
experimental  runs.  Depending  on  the  source  of  the  plan,  there  may  also  be  parentheses  containing  the 
original  basic  design  that  was  fractionated.  An  indication  of  the  number  and  levels  of  variables  is  also  given 
in  the  plan  as  an  orthogonal  array.  Computation  of  the  effect  of  each  particular  variable  when  subjected  to 
variations  in  the  levels  used  is  very  simple. 

Taguchi's  main  contribution  to  D.O.E.  is  the  introduction  of  uncontrolled  variables  within  a  proposed 
design.  This  is  done  via  the  following  sequence: 

a)  identify  the  controlled  variables  for  the  process  or  product ; 

b)  identify  the  uncontrolled  variables  that  affect  the  process  or  product  performance; 

c)  choose,  or  construct,  arrays  for  the  controlled  and  uncontrolled  variables  in  such  a  way  that  the  design 
plan  for  the  controlled  variables  is  obtained  within  the  design  plan  for  the  uncontrolled  ones,  thus 
obtaining  the  desired  responses  for  the  problem; 

d)  analyze  the  overall  design  and  set  new  levels  for  the  controlled  variables  using  the  mathematical 
transform,  "signal-to-noise  ratio",  i.e.,  the  performance  statistic  that  estimates  the  effect  of  the 
uncontrolled  variables  on  the  response(s); 

e)  run  the  experimental  design  again  and  check  if  the  new  design  satisfies  the  improvements  suggested  by 
the  mathematical  transform,  i.e.,  establish  the  performance  statistics. 

The  mathematical  transform 

Here  lies  one  of  the  major  criticisms  of  Taguchi^  approach  :  the  use  of  the  so  called  signal-to-noise  ratio  , 
as  the  sole  mathematical  transform  at  all  times,  (see  [3]  for  a  complete  review) 

There  are  three  cases  that  the  research-engineer  can  choose  for  his  performance  statistic: 

a)  which  specific  target  value  is  best 

b)  minimization,  i.e.,  the  smaller ,  the  better 

c)  maximization,  i.e.,  the  larger ,  the  better 


AN  EXPERIMENT 

For  the  sake  of  illustration  let's  perform  an  experiment  according  to  the  sequence  suggested.  Let's  study  the 
production  of  zinc  oxide,  where  dense  pieces  are  a  must  in  order  to  enhance  the  electrical  properties  of  the 
final  product  [9];  thus  ,  the  response  to  be  examined  is  the  real  density  of  the  produced  piece.  Let's  follow 
the  sequence: 


483 


a)  identify  the  controlled  variables. 

1.  temperature  in  °C,  at  three  levels  :  1 100,  1200  and  1300. 

2.  time  in  hours,  at  three  levels  :  2,  3  and  4. 

3.  oxalic  acid  dispersant  in  percent ,  at  three  levels:  0,  7.5  and  15. 

4.  polyvinyl  alcohol  ligand  in  percent ,  at  three  levels:  0,  1,  and  2. 

b)  identify  the  uncontrolled  variable(s). 

furnace  ventilation,  at  three  levels  :  ambient ,  forced  ventilation  low,  forced  ventilation  high. 

c)  the  array 

1 .  an  L9  for  the  controlled  variables  was  chosen 

2.  taken  at  the  three  settings  of  the  uncontrolled  variable 

3.  obtaining  the  real  density  in  %  as  the  answer,  for  every  setting  defined,  in  replicates  (although 
unecessary),  the  averages  were  assessed. 

4.  these  averages  were,  following  an  L9: 

1  =  92.41 

2  =  90.09 

3  =  89.92 

4  =  91.74 

5  =  91.72 

6  =  92.17 

7  =  91.69 

8  =  93.04 

9  =  91.88 

d)  the  performance  statistics  (  signal-to-noise  ratio  ) 

1  .for  this  specific  case  study,  the  selected  S/N  transform  was  the  larger  ,the  better. 

2.  the  computed  S/N  ratios  were  quite  near  to  one  another,  as  follows: 

1  =  39.31 

2  =  39.09 

3  =  39.08 

4  =  39.25 

5  =  39.25 

6  =  39.29 

7  =  39.25 

8  =  39.37 

9  =  39.26 

3.  so  it  was  decided  by  the  research  team  to  compute  the  variance  for  each  experiment: 

1  =  0.078 

2  =  0.031 

3  =  0.220 

4  =  0.046 

5  =  0.083 

6  =  0.007 

7  =  0.019 

8  =  0.013 

9  =  0.004 

4.  and  so,  the  option  for  analyzing  the  data  was  to  determine  the  effect  of  each  variable  on  the  final 
response  (average,  S/N  and  variance),  within  their  levels,  instead  of  referring  to  the  classical 
ANAVA.  In  fact,  such  an  analysis  gives  the  controlling  levels  for  each  variable  to  increment  the 
average  density  ,  decrease  the  variance,  and  maximize  the  S/N,  as  follows: 

Temperature  =  1300  °C  (average  up  by  0.57,  variance  down  by  0.043  and  S/N  up  by  0.054); 

Time  =  2  hours  (average  up  by  0.32,  variance  down  by  0.008  and  S/N  up  by  0.030); 

Dispersant  =  0%  (average  up  by  0.91 ,  variance  down  by  0.023  and  S/N  up  by  0.086  ); 

Ligand  =  maybe  at  0.5  % 

since  at  0%  (average  was  up,  variance  was  unchanged  and  S/N  was  up) 

and  at  1%  (average  was  down,  variance  was  downalot,  and  S/N  was  down) 


484 


5.  For  the  uncontrolled  variable  ,  furnace  ventilation,  although  Level  2  is  indicated  as  it  provided  an 
increase  in  the  average,  it  also  introduced  the  highest  variance  and  an  insignificant  change  in  S/N. 

e)  run  the  experimental  design  again:  the  experiment  was  performed  with  the  new  conditions  suggested  by 
(d)  and  with  no  furnace  ventilation,  with  the  results  showing  an  improvement  in  the  average  with 
minimum  variance  and  maximum  S/N. 


CONCLUSION 

Thus,  the  so  called  Robust  Designs  are  orthogonal  arrays  formally  identified  with  fractional  factorial 
designs,  constituting  of  a  design  matrix  for  the  controlled  variables  and  another  for  the  uncontrolled 
variables  in  such  a  way  that  the  response  is  obtained  under  the  settings  of  the  uncontrolled  variables 
matrix,  that  will  give  the  variances  of  the  system,  and  analyzed  via  the  mathematical  transform,  the  ‘fcignal- 
to-noise  ratio”,  for  optimization  of  the  controlled  variables  and  their  levels. 

It  is  indeed,  quite  a  simple  procedure  and  has  the  advantage  of  implementing  at  the  earlier  stages  of  the 
process/product  cycle  the  robustness  of  it. 


REFERENCES 

1.  Taguchi,  G.,  1987.  System  of  Experimental  Design  :  Engineering  Methods  to  Optimize  Quality  and 
Minimize  Costs  ,  UNIPUB,  White  Plains,  N.Y. 

2.  Taguchi,  G.,  Yokoyama,  Y.,  1994.  Taguchi  Methods  :  Design  of  Experiments,  ASI,  Dearborn,  MI. 

3.  Box  ,  G.,  1988.  Signal  to  Noise  Ratios,  Performance  Criteria  and  Transformations  ,  Technometrics, 
30(1),  1-17. 

4.  Berk,  K.N.,  Picard,  R.R.,  1991 .  Significance  Tests  for  Saturated  Orthogonal  Arrays  ,  J.  Qual.  Tech., 
23(2),  79-89. 

5.  Lucas,  J.M.,  1994.  How  to  Achieve  a  Robust  Process  Using  Response  Surface  Methodology ,  J.  Qual. 
Tech.,  26(4),  248-260. 

6.  Kacker,  R.N.  et  al.,  1991 .  Taguchi’s  Fixed  Element  Array  are  Fractional  Factorials  ,  Journal  of  Quality 
Technology, 23, 2, pp.  107-1 16  . 

7.  Villas  Boas,  R.C.,  1996.  Arranjos  Ortogonais  de  Taguchi:  os  Ln(2k) ;  Serie  Qualidade  e  Produtividade, 
9,  1-14,  Rio  de  Janeiro 

8.  Fowlkes, W.Y.,  Creving,  C.M.,  1995.  Engineering  Methods  for  Robust  Product  Design  Using  Taguchi 
Methods  in  Technology  and  Product  Development ,  Addison-Wesley  Publishing  Co.,  Reading,  MA. 

9.  Duarte,  M.V.S.  et  al.,  1998.  Zinc  Oxide  for  Varistor  Manufacturing  ;  Class  Exercise  under  Prof.  R.C. 
Villas  Boas,  The  Federal  University  of  Rio  de  Janeiro,  COPPE,  Rio  de  Janeiro  . 


485 


Imaging  Techniques  for  Process  Optimization 
and  Control  in  Glass  Recycling 

Giuseppe  Bonifazi,  Paolo  Massacci 

Dipartimento  di  Ingegneria  Chimica,  dei  Materiali,  delle  Materie  Prime  e  Metallurgia 
Universita  di  Roma  ‘La  Sapienza”,  Via  Eudossiana  18  00184  Rome,  Italy 


ABSTRACT 

Glass  fragments  (cullets)  to  be  recycled  present  different  market  values  according  to  their  color.  Glass 
recycling  plants  perform  cullets  sorting  mainly  discriminating  colored  glasses  from  white  and  half  white 
glasses;  furthermore  sorting  presents  some  other  technological  limits  concerning  the  minimum  cullet  size, 
about  4+5  mm,  that  is  possible  to  analyze.  Cullets  which  are  collected  without  distinctions  of  color,  can  be 
used  primarily  for  the  production  of  green  glass  and  only  in  part  for  the  production  of  yellow  glass.  The 
production  of  white  glass  requires  that  only  cullets  of  that  color  be  employed.  At  present,  machines  for  the 
separation  of  cullets  according  to  color  are  not  capable  of  producing  an  efficient  classification  of  all  the 
different  types.  In  this  paper  are  analyzed  the  possibility  that  could  be  offered  by  the  adoption  of  a  color 
imaging  based  approach  to  realize  cullets  sorting,  analyzing  the  textural  attributes  of  the  investigated  cullets' 
image  field.  This  study  was  mainly  focused  on  the  effects  that  cullets  surface  status  and  characteristics 
produce  on  the  detected  color-textural  characteristics  and  they  can  influence  the  further  classification.  All 
the  tests  have  been  performed  on  glass  samples  as  they  result  after  the  cleaning  stage,  impurities  removal,  of 
an  industrial  glass  recycling  plant. 


INTRODUCTION 

Glass  can  be  historically  considered  one  of  the  most  recycled  materials.  Glass  recycling  can  be  realized  by 
following  different  rules  according  to  glass  products  such  as  containers  (vases  and  bottles),  plates  (domestic 
and  industrial),  glass  works  wastes,  etc.  or  according  to  characteristics  and  destination  of  use.  Glass  can  be 
"easily"  recycled  thanks  to  some  intrinsic  characteristics  such  as:  i)  non-absorbent  no  intrinsic  flavor  and 
odor;  ii)  resistance  to  temperatures  required  for  cleaning  and,  finally,  iii)  strength  and  mechanical  resistance 
allowing  multiple  filling  and  reuse.  In  some  cases  it  is  possible  to  collect  and  direct  re-use  of  the  "objects", 
after  cleaning  without  any  modification  to  physical  characteristics  (i.e.,  such  as  comminution). 
Unfortunately,  this  strategy  can  only  be  applied  in  limited  cases,  considering  the  great  amount  of  glass 
waste  produced.  In  most  cases,  glass  recycling  means  to  collect  and  process  glass  materials  usually  resulting 
from  differentiated  collection  of  urban  waste.  In  this  case,  it  is  necessary  to  select  specific  and  appropriate 
processing  strategies  addressed  at  "cleaning"  glass  fragments  of  all  impurities  that  would  compromise  re¬ 
use  of  the  glass  both  at  melting  and  final  product  manufacturing.  Together  with  "cleaning"  and  other 
important  aspects  linked  to  glass  fragment  (cullets)  recycling,  there  is  the  need  to  realize  additional 
"selection"  of  the  glass  according  to  their  destination  based  of  their  color  characteristics.  Successful 
achievement  of  these  goals  together  with  the  intrinsic  physical-chemical  properties  of  glass,  permit  us  to 
reach  several  positive  targets:  i)  savings  of  raw  materials  necessary  for  production;  ii)  reduction  in  energy 
consumption  and  iii)  reduction  in  the  quantity  of  solid  urban  waste.  All  these  aspects  push  the  recycling 
industry  to  extensively  apply  technology  in  order  to  recycle  glass  material. 

The  setup  of  efficient  and  reliable  cullet-processing  involves  defining  specific  rules  addressed  at: 

•  collecting  criteria  strategies  directed  at  simplifying  further  processing; 

•  identifying  all  "polluting"  materials  that  can  reduce  processing  efficiency  and,  in  some  cases, 
compromise  the  final  correct  re-use  of  cullet  materials; 

•  defining  suitable  strategies  to  remove  the  polluting  elements; 

•  realizing  a  "final"  separation  of  cullets  by  sorting  according  to  their  color. 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


486 


Dark  green  Light  green  White  Half-white  Brown 


Brown 


16x16  mm 


Light  green  White  Half-white 


Fig.  1.  Dark  green,  light  green,  white,  half-white  and  brown  cullets  (sampled  in  the  plant)  and  washed  with 
a  chromic  mixture  (11  H2S04  concentrate  and  30g  K2Cr207),  rinsed  with  demineralised  water  and 
dried  at  90°C.  Images  acquired  at:  2x2  mm  (high  mag)  and  16x16  mm  (low  mag). 

With  reference  to  cullet  sorting  in  recent  years,  glass  works  requirements  have  increased  in  terms  of 
"acceptable  polluting  individuals"  and  as  a  consequence,  stricter  rules  must  be  applied  for  sorting  both  in 
terms  of  "quality  control  of  separation"  and  "lower  size  limits"  of  the  cullets.  Both  the  rules,  and  especially 
the  size  criterion,  are  not  so  easy  to  realize.  Actually  cullets  sorting  is  realized  by  using  as  a  detector,  a  laser 
beam  technology-based  device.  The  sorting  logic  is  mainly  analogic.  An  "on-off'  logic  is  applied. 

Detection  is,  in  fact,  based  on  an  evaluation  of  the  "characteristics"  of  the  energy  and  spectra  received  by  a 
detector  after  the  cullets  are  crossed  by  a  suitable  laser  beam  light.  Technological  limits  can  be  divided  into 
two  classes:  those  related  to  the  construction  characteristics  of  the  equipment  and  those  related  to  the 
material  (Figure  1).  The  first  are  mainly  linked  with  the  physical  dimension  and  mechanical  arrangement  of 
the  optics  used  to  generate  signals,  and  with  the  pneumatic  architecture  enabling  modification  of  a  cullet 
trajectory  to  realize  sorting,  according  to  color.  Furthermore,  flow  characteristics  can  influence  selection. 
To  realize  efficient  optical  sorting,  the  flow  must  be,  in  principle,  constituted  by  particles  forming  a  mono- 
layer.  In  this  condition,  cullets  can  be  analyzed  one  at  a  time  by  the  laser  beam.  Recent  equipment  is  so  fast 
in  its  analysis  that  the  same  cullet  can  be  tested  several  times  according  to  its  dimension.  As  a  consequence 
the  larger  the  cullet,  the  better  its  control.  The  influence  of  "anomalies"  (Figure  2)  can  be  thus  reduced  since 
each  cullet  is  analyzed  by  more  than  one  detector  and  for  more  than  one  time.  For  smaller  pieces,  such 
conditions  are  difficult  to  realize  since  they  can  pass  through,  unsorted.  As  well  diffraction/refraction 
effects  can  be  so  strong  (presence  of  marked  cleavage  or  surface  anomalies)  that  detectors  are  practically 
unable  to  analyze  them.  The  establishment  of  monolayer  conditions  and  effective  flowrates  are  also 
problematic.  All  these  problems  mean  that  sorting  can  not  be  used  profitably  for  the  entire  size  ranges  of 
cullets.  Materials  of  smaller  dimensions  resulting  from  the  processing-cleaning  stages  can  not  be  treated  and 
so  are  usually  rejected.  This  fact  represents  a  double  cost:  Glass  material  is  lost  and  money  to  pay  to  store 
such  products  in  a  dump  must  be  spent.  The  aim  of  this  paper  is  to  present  the  possibilities  offered  by  color 
digital-imaging  to  realize  the  recognition  of  cullets  presenting  with  an  average  dimension  of  about  2  mm,  on 
the  basis  of  their  color  characteristics  starting  from  an  evaluation  of  their  pictorial  aspects  and  moving 
towards  full-control  of  the  quality  of  the  separated  products  when  such  strategies  are  applied. 

CULLET  CHARACTERIZATION 

Cullet  sorting  is  realized  on  the  move.  After  a  "cleaning"  stage,  the  material  is  fed  to  a  sorting  system  in 
order  to  realize  the  desired  separation.  As  a  consequence,  material  flow  can  be  considered  a  complex 


487 


domain  constituted  by  several  elements  (glass  particles);  each  characterized  by  specific  attributes  (size, 
shape,  color  and  degree  of  dirtiness).  Such  properties,  by  directly  looking  at  the  bulk  material,  are  not  so 
easy  to  investigate;  especially  when  the  task  must  be  done  "on-line"  at  industrial  plant  scale. 


Fig.  2.  The  spectral  response  of  the  cullets,  suitably  energized,  is  based  on  an  evaluation  of  the 
transmitted  energy  received  by  the  detectors.  The  detected  energy  is  influenced  by  the  status 
(dirty  or  clean)  and  the  characteristics  (fragments  of  bottle  neck  or  vase  with  or  without 
thread,  bottle  or  vase  bottom,  etc.)  of  the  cullets  surface. 

2a:  16x16  mm  image  of  dirty  light  green  cullet;  2b:  16x16  mm  image  of  white  dirty  cullet; 

2c:  2x2  mm  image  of  white  cullet  (threaded  bottle  neck);  2d:  2x2  mm  image  of  half-white 
cullet  (bottle  bottom);  and  2e:  16x16  mm  image  of  dark  green  cullet  (bottle  bottom). 

Cullets  as  a  Source  of  Information 

A  "layered"  bulk  solid  sample  [1]  presents  in  general,  certain  specific  pictorial  attributes  perceived  by  our 
senses  according  to  the  lighting  conditions  and,  as  a  consequence,  by  the  spectral  response  of  the  surfaces 
constituted  by  the  particles  themselves.  The  spectral  characteristics  of  each  element  of  the  surface  is  related 
to  the  physical  characteristics  of  the  constituting  particles  and  the  energizing  source  (lighting).  With 
reference  to  a  digital  color  image,  this  is  a  discrete  domain  characterized  for  each  constituting  element,  by  a 
triplet  of  values  representing  three  base  color  components  (Figure  3).  The  source  of  information,  for  each 
pixel,  is  thus  represented  by  the  three  values  of  the  three  corresponding  color  components  [2], 

Samples  and  Sampling 

All  tests  have  been  carried  out  by  adopting  a  set  of  samples  (cullets)  taken  from  the  feed  of  an  industrial 
recycling  plant.  Different  sample  sets,  according  to  different  colored  cullets,  were  identified  and  collected. 
The  cullets  were  manually  selected,  on  the  basis  of  their  color,  into  five  color  classes:  light  green,  dark 
green,  white,  half  white  and  brown  (Figure  1).  Each  cullet  was  classified  and  stored  to  be  always 
identifiable  in  the  course  of  subsequent  experiments  and  measurements. 

IMAGING  PROCEDURES 

Cullet  image  acquisition  was  performed  by  lighting  each  sample  using  transmitted  light  of  known 
wavelength  (5000°K).  The  cullet  was  lit  from  below  and  the  transmitted  light  was  registered  by  an  optical 
system  equipped  with  a  colour  video-camera  (i2s  IEC800CC)  connected  to  a  RGB  frame-grabber  (Matrox 
MVP  AT)  installed  on  a  standard  PC.  A  randomly  investigated  image  field  for  each  cullet  of  2x2  mm  was 
examined  using  a  Leica-Wild  M8  stereo  microscope.  Such  a  procedure  was  adopted  to  analyze  and 
critically  evaluate  the  image  fields  in  terms  of  the  characteristics  of  the  detected  colors  of  the  cullet  sample 
and  the  status  of  their  surfaces  (dirty  or  clean).  Furthermore,  by  adopting  such  a  magnification,  it  was  also 
possible  to  evaluate  the  influence  of  geometric  characteristics  (fragment  of  bottle  neck  or  vase  with  or 
without  thread,  bottle  or  vase  bottom,  etc.)  on  the  information  collected  by  imaging.  The  collected  digital 
image  set  was  analyzed  by  adopting  two  different  strategies: 

•  a  "simple"  color  spectra-based  approach:  each  cullet  is  characterized,  analyzed  and  classified 
according  to  its  RGB  (red,  green,  blue)  spectral  distribution  [3]; 

•  a  texture-based  approach  (Appendix):  each  cullet  was  characterized,  analyzed  and  classified 
according  to  textural  characteristics  (spatial  occurrences  of  color  tone  on  the  image)  determined 
with  reference  to  a  HSB  (hue,  saturation,  brightness)  color  reference  system  [1], 

Several  procedures  can  be  used  to  estimate  the  sensitivity  of  these  two  approaches  (color-  and  texture- 
based),  with  respect  to  the  possibility  to  perform  complete  sorting  of  the  five  cullets  color  classes. 


488 


Fig.  3.  RGB  and  HSB  color  component  characteristics  along  the  pre-assigned  alignment  (scan  line) 
reported  on  3a  intercepting  five  different  image  windows  belonging  to  cullets  with  different 
color  and  surface  characteristics.  The  presence  of  such  characteristics,  or  "anomalies",  strongly 
affect  the  spectral  response  and  quality  of  the  color  "signature"  for  use  in  fiirther  processing. 

Statistics  or  neural  nets  procedures,  widely  adopted  [4,5]  in  previous  studies,  only  based  on  cullets  color 
evaluation,  permitted  us  to  achieve  quite  good  results.  The  limit  of  these  approaches  is  the  computing  time 
and  the  necessity  "to  train"  the  sorting  system  to  recognize  the  objects  (cullets)  to  sort.  This  choice  was 
mainly  adopted  due  to  the  complexity  of  the  available  data  set,  the  object  of  the  investigation,  and  because 
an  intrinsic  correlation  sometimes  exists  between  the  detected  parameters.  The  adoption  of  a  texture-based 
approach  seems  quite  promising  at  least  in  terms  of  simplification  of  the  data  analysis  procedures.  Textural 
algorithms,  in  fact,  were  demonstrated  to  be  sensitive  to  some  cullets  physical  parameters  not  fully 
recognized  and  numerically  quantified  by  the  "simple"  color  spectra  based  approach. 

Cutlet  Classification 

Industrial  glass  recycling  sorters  classify  by  looking  at  one  specific  cullet  attribute  per  cycle  for  each  cullet. 
Assuming,  the  mean  color  and  textural  values,  for  each  investigated  field  of  the  glass  fragment,  is 
representative  of  the  cullet  itself,  it  is  possible  by  evaluating  only  one  cullet  attribute  per  cycle  time,  to 
define,  for  each  family  of  cullets  (dark  green,  light  green,  white,  half-white  and  brown)  a  set  of  Washability 
Functions  (WF)  [6],  one  for  each  selected  attribute,  This  approach  is  very  easy  to  plot  and  analyze. The  WF 
graphs  assume  the  same  meaning  as  used  in  the  field  of  mineral  processing.  Such  a  plot  is  a  cumulative  plot. 


489 


&  60  f 


V 

4 


n. 


Half  white 
cullets 


Light  greeny 
cullets 


White  !j 
cullets 


20  I  Brown*  \  Dark  green 


cullets , 


*  cullets 


Jg.  so 


128 

Color  Tone 

4a 


Brown 

cullets 


Dark  greew 
cullets 


.  White* 
*.  cullets  ° 


Light  green 
cullets 


\A 


Half  white  | 
cullets 


-1  -0.6  -0.2  0.2  0.6  1 
Correlation  Textural  Parameter  referred  to  Hue  Color  Component 


4b 


Fig.  4.  Color  washability  curves  based  on: 

4a.  the  blue  mean  color  value  component  (RGB  color-based  approach);  and 
4b.  the  correlation  value  computed  on  the  hue  component  (texture-based  approach)  of  the 
different  sets  of  clean  cullets  examined  (mean  value  from  the  analyses  carried  out  on  each 
digital  image  along  the  four  directions:  0°,  45°,  90  and  135°). 

Separation  functions,  based  both  on  color  and  textural  distribution  of  the  investigated  sample 
set,  can  thus  be  defined  and  threshold  selection  criteria  applied. 


On  the  x-axis,  the  values  representative  of  the  physical  property  investigated  are  reported,  i.e,  the  value  of 
the  cullet  colour  components  or  textural  parameters  as  extracted  from  the  image  analysis  procedures.  On  the 
y-axis,  the  number  (normalised  frequency)  of  cullets  presenting  a  value  of  the  colour  component  or  textural 
parameter  equal  or  less  than  that  of  the  corresponding  abscissa,  are  reported.  Each  set  of  cullets,  considered 
as  a  whole,  can  be  evaluated  in  terms  of  high  or  low  possibilities  to  be  grouped  (sorted)  on  the  basis  of 
color.  The  WF  is  built  with  reference  to  the  mean  value  of  the  blue  color  component  while  the  WF  is 
determined  for  the  textural  parameter  by  computing  on  the  hue  color  component  (Figure  4).  The  system 
shows  that  quite  good  sorting  can  be  realized  by  adopting  a  "serial  recognition-classification"  approach  in 
which  detection  and  separation  of  the  dark  green,  light  green  and  brown  cullets  are  done  only  on  the  basis  of 
their  color  and  separation  of  the  remaining  white  and  half  white  cullets  uses  textural  characteristics. 


CONCLUSION 

The  adoption  of  combined  digital  imaging  based  approach  utilizing  both  color  and  textural  cullets 
recognition  procedures  seems  to  be  quite  promising  especially  in  terms  of  their  direct  use,  adopting  simple 
threshold  ("on-off')  selection  criteria,  quite  similar  to  those  adopted  in  the  actually  utilized  sorters  (based 
on  laser  technology).  Further  studies  to  enhance  and  simplify  the  procedures,  especially  in  terms  of  data 
detection  and  processing  speed,  will  be  addressed  to  investigate  possible  improvements:  i)  adopting  specific 
wavelength  to  enhance  cullets  characteristics  (light  transmittance,  color  and  texture)  and  ii)  identification  of 
other  optical-digital  characteristics  to  utilize  for  the  classification  (cullets  morphology  and  cleavage). 

ACKNOWLEDGEMENTS 

The  authors  wish  to  thank  Mr.  E.  Weber  (WTE,  Milan,  Italy)  for  his  help  during  the  selection  of  the  glass 
cullets  products  examined  and  Mr.  H.  Frish  (S+S  Metallsuchgerate  und  Recyclingtechnik,  Schonberg, 
Germany)  for  his  suggestions  during  the  development  of  the  work.  Many  thanks  also  to  Mr.  M.  Delfini  and 
Mr.  M.  Ferrini  (technical  staff  of  the  Department)  for  their  contribution  in  testing. 

REFERENCES 

1.  W.K.  Pratt,  1991.  Digital  Image  Processing.  2nd  edition,  John  Wiley  &  Sons  Inc.,  New  York. 

2.  G.  Bonifazi,  1997.  Imaging  Techniques  applied  to  Bulk  Handling  and  Processing.  Bulk97  Design 
Seminars.  Materials  Handling  Engineers’ Association.  5.3,  1-7,  The  Belfry,  Sutton  Coldfield,  UK. 

3.  G.  Bonifazi,  P.  Massacci,  1998.  Cullets  (Glass  Fragments)  Quality  Control  by  Artificial  Vision:  a  Color 
based  Approach.  Quality  Control  by  Artificial  Vision:  QCAV  '98.  94-99,  Takamatsu,  Japan. 


490 


4.  G.  Bonifazi,  P.  Massacci,  G.  Patrizi  G.  Zannoni,  1997.  Colour  Classification  for  Glass  Recycling. 
Proceedings  of  the  XX  Int.  Mineral  Processing  Congress:  IMPC97,  5, 239-252,  Aachen,  Germany. 

5.  G.  Bonifazi,  P.  Massacci,  1996.  Particle  Identification  by  Image  Processing,  KONA  No.  14,  Powder 
and  Particle.  Hosokawa  Micron  Corporation,  Osaka,  Japan. 

APPENDIX 

Digital  Image  Textural  Characterization 

Textural  characterization  (Pratt,  1991)  of  the  different  glass  fragments  was  carried  out  evaluating  the  spatial 
relationship  existing  between  the  different  color  levels  in  each  component,  operating  a  change  of  color 
coordinate  system  from  RGB  (red,  green,  blue)  to  HSB  (Pratt,  1991),  presenting  these  last  three 
components  (hue,  saturation,  brightness)  a  lower  degree  of  correlation  than  the  red,  green  and  blue.  To 
quantify  the  “textural”  characteristics  of  the  image  cullets,  the  following  numerical  relationships  have  been 
considered  and  the  results  of  the  parameterization  evaluated: 

Angular  Second  Moment  (ASM):  Contrast  (CON):  Correlation  (COR): 

SSI iJPV’jl'Rl-VxVy 

_  <  j _ 

CfyC?  v 

being  R  number  of  pairs  of  contiguous  picture  elements  in  the  matrix  being  examined;  p(i,j  )  the  element 
(ij)  in  the  normalized  gray  spatial-tone  dependence  matrix,  [P(i,j)/R];  Ng  the  number  of  gray  levels  in  the 
image;  the  averages  of  the  marginal  distribution  associated  with  p(ij);  cx,  e  oythe  standard 

deviations  in  the  marginal  distribution  associated  with p(ij)\  x,  y  the  coordinates  of  the  generic  pixel  on  the 
image; 

N,  Ng 

Zj  andX  respectively  £  and  px+y  (k)  =  £  £  p(i,  j)  where:  k  =  2,3,...,2Ng  being  i+j  =  k 

i  j  (=1  y=l  i  j 

Spatial  Tone  Dependence  Matrix 

A  spatial-tone  dependence  matrix  describes  the  relationships  existing  between  the  tone  level  value  (H,  S, 
and  B  for  color  images  or  GL  for  gray-level  images)  of  the  pixel  at  known  distances  (d  pixels)  and 
directions  (k).  The  formulation  of  these  relationships  allows  one  to  build  a  matrix  M^  (the  spatial-tone 

dependence  matrix),  where  k  is  the  direction  along  which  the  distance  d  is  calculated.  If,  for  example,  CP  is 
assumed  as  main  direction,  the  corresponding  M^  matrix  will  be  defined  as: 

Mo°(l,m,d)  =  #{[(p,q),(r,s)]  G  A  }being  A  =  (LyLx)(LyLx)  |  p-r  =  0,|  q-s  |  =  d,  I(p,  q)  =  1, 1(r,s)  =  m 

where:  1  =  row  index  for  matrix  M^,  m  =  column  index  for  matrix  M^,  d  =  distance,  along  the  direction  k, 

defining  the  spatial  relationship  of  nearby  pixels,  p:  row  index  of  the  first  pixel  considered  in  the  image,  q: 
column  index  of  the  first  pixel  considered  in  the  image,  r:  row  index  of  the  second  pixel  considered  in  the 
image,  s:  column  index  of  the  second  pixel  considered  in  the  image,  Lx:  number  of  pixels  in  the  image 
along  the  x  axis  and  Ly:  number  of  pixels  in  the  image  along  the  y  axis.  The  tone  (1)  (color  component  or 
gray  level)  of  the  pixel  at  co-ordinates  (p,q)  is  compared  with  the  tone  (m)  (color  component  or  gray  level) 
associated  with  the  pixel  at  co-ordinates  (r,s).  The  symbol  #  represents  the  number  of  pixel  pairs  that  satisfy 
the  required  criteria.  As  regards  the  matrix,  the  element  M0°(I,m,d)  is  equal  to  the  set  of  ordered  pixel  pairs 
(p,q),  (r,s),  whose  distance  along  the  y  axis  is  zero  and  along  the  x  axis  is  d,  and  whose  tones  g(p,q)  and 
g(r,s)  have  values  (1-1)  and  (m-1),  respectively.  The  matrices  are  symmetrical,  i.e,:  M^l.m.d)  =  M^m^d). 

So,  a  normalized  spatial-tone  dependence  matrix  is  one  whose  generic  element  equals  M^  (l,m,d)  /  T^ 
where  T^  assumes  the  following  expressions  for  the  main  directions  0°,  45°,  90°  and  135°  :  To°  =  2(NX  - 
l)Ny,  T450  =  2(NX  -l)(Ny  -1),  T900  =  2(Nx(Ny  -1))  and  T1350  =  2(NX  -l)(Ny  -1),  respectively. 


fl 


N  -1 


«= 0 


M=w 


p('\  J) 

R 


491 


Application  of  Heuristics  and  Fuzzy  Logic  to 
Natural  Resource  Modelling 

Steven  Mackinson 

Fisheries  Centre,  2204  Main  Mall,  University  of  British  Columbia,  B.C.,  Canada 
Phone  604-822-2731,  Email:  smackin@fisheries.com 

ABSTRACT 

The  complexity  and  dynamics  of  natural  systems  poses  considerable  difficulties  for  mathematical 
description.  Conventional  modelling  techniques  that  rely  purely  on  an  analytical,  algorithmic  approach  are 
poor  at  capturing  non-linear  processes,  cumulative  effects  and  uncertainties  characteristic  of  such  systems. 
In  this  respect,  heuristic  models  using  the  principles  of  fuzzy  logic  are  well  suited  for  describing  and 
simulating  processes  and  dynamics  of  natural  systems.  Currently,  there  are  few  examples  of  fuzzy 
knowledge-based  models  in  natural  resource  management  and  there  is  significant  potential  for  future 
development.  Demonstrated  here,  are  two  recently  successful  applications;  (i)  a  fuzzy  expert  system  that 
applies  a  novel  method  of  defuzzification  to  predict  changes  in  the  structure,  dynamics  and  mesoscale 
distribution  of  fish  shoals,  (ii)  a  method  utilising  fuzzy  approximation  theory  to  predict  the  recruitment  of 
young  fish  to  a  fishery  based  on  the  parent  stock  size  and  past  recruitment  conditions. 


INTRODUCTION 

Natural  resource  management  demands  recognition  of  the  inherent  variability  of  natural  processes.  In  an 
attempt  to  satisfy  these  needs,  analytical  methods  for  explicit  quantification  of  variability  and  uncertainty 
have  pervaded  the  scientific  discipline  [1],  However,  ecosystem  processes  in  general  do  not  yield  well  to 
description  by  conventional  analytical  techniques;  their  complexity,  non-stationarity,  and  non-linear,  even 
chaotic,  features  defy  conformity.  Fuzzy  knowledge-based  systems  offer  an  alternative  tool  to  traditional 
analytical  models.  Relationships  descriptive  of  biological  and  ecological  processes  can  be  easily  explained 
and  understood  by  natural  language  heuristic  rules  that  define  them.  Use  of  knowledge-based  systems  is  an 
admission  that  our  knowledge  is  incomplete  and  uncertain,  yet  through  building  and  testing,  it  is  a  move 
toward  practicality,  recognising  that  decisions  based  on  qualitative  and  sometimes,  incomplete  knowledge 
is  still  better  than  making  decisions  without  any  understanding  [2], 

Knowledge-based  systems  have  been  used  in  the  field  of  natural  resource  management  for  some  time  [3] 
although  applications  in  fisheries  science  are  more  limited  [2],  There  is  considerable  scope  for  future 
applications,  particularly  those  utilising  fuzzy  logic.  Two  recently  successful  'fuzzy'  applications  are 
presented  here;  (i)  a  fuzzy  expert  system  to  predict  structure,  dynamics  and  mesoscale  distribution  of  fish 
shoals  [4,5],  (ii)  a  method  using  fiizzy  approximation  theory  to  predict  recruitment  of  young  fish  to  a 
fishery  [6]. 


PREDICTING  STRUCTURE  AND  DISTRIBUTION  OF  FISH  SHOALS 

Despite  recent  attempts  to  link  cross  scale  behaviour  dynamics  and  distribution  studies  on  shoaling  fish  [7], 
large  gaps  still  exist  in  our  basic  scientific  understanding.  Nonetheless,  the  knowledge  of  fishers  and  fishery 
managers  has  not  readily  been  incorporated  into  scientific  analyses,  despite  the  fact  that  such  information  is 
rich  in  observation  since  knowledge  of  fish  behaviour  and  distribution  is  a  prerequisite  for  their  profession. 
A  fuzzy  logic  expert  system  is  used  as  a  formal  framework  to  combine  local  knowledge  from  interviewed 
fishers,  fishery  managers  and  First  Nations  people  (24  interviews),  with  more  conventional  scientific 
information  from  interviewed  fisheries  scientists  (7  interviews),  field  work  studies  (3)  and  published 
literature  sources  (102  references),  in  an  attempt  to  bridge  some  gaps  in  our  knowledge.  All  knowledge 
contributes  equally  in  building  the  knowledge  base,  thus  the  potential  of  all  data  sources  is  maximised  [8]. 

A  'bottom  up'  conceptual  approach  is  used  in  development  of  the  model;  heuristic  rules  defining 
relationships  between  ecological,  biological  and  motivational  factors  and  their  effect  on  fish  behaviour.  Of 


0-7803-5489-3/99/$10.00  ©1999  IEEE. 


492 


those  factors,  the  key  attributes  are  considered  to  be  food,  predation  and  reproductive  state  [9],  The 
dynamic  interplay  of  these  attributes  combined  with  temporal  changes  in  the  fish's  'life-priorities'  result  in 
trade-offs  producing  behavioural  responses  that  are  manifested  as  changes  in  shoal  structure,  dynamics  and 
distribution. 

Defining  functional  relationships  using  rules 

Heuristic  rules  written  in  natural  language,  form  relationships  between  attributes  that  influence  fish 
behaviour  and  descriptors  of  shoal  structure,  dynamics  and  distribution.  For  example: 

IF  fish  direction  facing  current 
AND  current  strength  strong 
THEN  mean  swimming  speed  low  (item  confidence  =  x) 

AND  shoal  shape  horizontally  elongated  (item  confidence  =  y) 

In  the  rule  above,  the  variable  'current  strength'  is  designated  a  fuzzy  variable  with  member  sets  strong  and 
not  strong  (Fig  1).  Fuzzy  sets  allow  construction  of  a  model  that  directly  represents  knowledge  contained  in 
linguistic  expressions  given  by  an  interviewee. 


Member  sets  of  the  fuzzy 
variable  current  strength 


Fig  1.  Membership  functions  of  fuzzy  sets  on  the  fuzzy  variable  Current  strength'.  The  member  sets  (also 
called  subsets)  are  the  linguistic  concepts:  strong  and  not  strong.  The  slope  and  degree  of  overlapping  of 
the  memberships  functions  is  a  key  element  determining  the  uniqueness  or  'fuzziness'  of  the  sets.  The 
confidence  on  the  Y-axis  shows  our  degree  of  belief  in  the  linguistic  concepts.  For  example,  when  current 
strength  is  4  knots,  we  are  0.8  confident  that  current  strength  is  not  strong  and  also  0.2  confident  that 
current  strength  is  strong.  In  an  expert  system,  both  pieces  of  information  are  used  simultaneously  to  make 
conclusions,  thus  avoiding  the  simplistic  notion  that  something  is  or  is  not  true,  when  in  fact  it  may  be  both 
to  different  degrees.  Thus,  the  system  implicitly  captures  uncertainty.  The  value  of  current  strength  whose 
membership  (confidence)  is  1,  is  called  the  supremum  value.  The  range  of  current  strength  values 
contained  by  a  fuzzy  set  is  called  the  support. 

Relative  influence  of  attributes  -  hierarchy,  trade-offs  and  seasonality 

A  'weight  of  evidence'  approach  is  used  to  impose  hierarchy  to  the  degree  of  influence  each  attribute  has  on 
determining  the  resulting  structure,  dynamics  and  distribution  of  shoals.  The  method  principle  assumes  that 
the  more  frequently  an  attribute  is  mentioned,  the  higher  its  importance  relative  to  other  contributing 
factors.  Weight  is  applied  by  assigning  to  the  THEN  statements  of  each  rule,  an  associated  confidence 
factor  that  is  comprised  of  the  sum  of  two  parts;  interviews  and  literature  (each  of  which  are  given  equal 
importance).  A  combined  uncertainty  of  10%  is  assumed  for  all  rules,  thus  the  maximum  confidence  that  a 
THEN  statement  can  achieve  is  0.9.  During  operation,  confidence  assigned  to  each  THEN  statement 
propagates  through  the  system  adding  confidence  to  the  output  descriptor.  Those  statements  with  higher 
confidence  carry  more  'weight'  and  have  greater  effect.  The  'weight  of  evidence'  approach  further 
substitutes  as  a  means  to  represent  behavioural  trade-offs  that  occur  when  fish  balance  conflicting  forces. 
For  instance,  since  the  effect  of  predator  abundance  on  packing  density  has  a  higher  confidence  (Conf.  = 


493 


0.19)  than  the  effect  of  feeding  competition  (Conf.  =  0.01),  predators  will  have  a  greater  influence  on 
packing  density  even  during  competitive  interactions. 

Operational  logic,  including  rules  and  commands  are  also  applied  to  define  how  the  model  operates  under 
specific  circumstances;  in  certain  scenarios,  rules  may  be  ignored  whilst  others  are  followed,  or  variables 
may  be  pre-assigned  (in  particular,  when  they  are  deemed  of  low  importance).  In  addition,  the  user  may  be 
offered  placebos  (choices  that  do  not  lead  to  any  conclusions)  or  the  opportunity  to  assign  low  importance 
to  a  particular  factor.  This  provides  the  option  of  choosing  to  exclude  or  reduce  the  influence  of  certain 
attributes.  However,  if  the  user  answers  hot  sure’  where  knowledge  is  available,  an  effort  is  made  to 
assign  a  default  choice/value. 

Temporal  changes  in  motivational  state  are  modelled  by  assigning  a  group  of  'life-priority'  rules  that 
designate  behavioural  priorities  for  feeding,  avoiding  predators,  reproduction  and  energy  saving  during 
each  life  stage.  The  designations  of  priority  are  utilised  in  a  pseudo-weighting  method  that  applies  weight 
to  a  specific  variable  used  to  represent  that  priority.  The  effect  of  the  pseudo-weight  is  manifested  through 
changes  in  the  structure,  dynamics  and  distribution  of  shoals. 

Output  descriptors  of  shoal  structure,  dynamics  and  distribution 

The  weighted-average  method  of  defuzzification  is  used  to  obtain  a  single  non-fuzzy  value  as  output  for 
quantitative  descriptors.  The  method  is  based  on  a  multiplication  between  the  degree  of  membership  in  the 
output  fuzzy  sets  and  the  supremum  value  of  each  set  (Fig  2).  By  applying  the  same  procedure  to  maximum 
and  minimum  ranges  associated  with  each  of  the  output  fuzzy  sets  we  also  obtain  a  range  around  the 
discrete  output  value  (Fig  2).  Using  the  example  in  Figure  2,  the  discrete  defuzzified  weighted  output 
would  be  calculated  as  follows: 

Mean  =  [(0.2*Smallsup)+(0.6*Medsup)+(0.3*LargeSUp)]/  sum  of  confidence  (1.1) 

Range  min.  =  [(0.2*Smallmin)+(0.6*Medmin)+(0.3*Largemin)]/  sum  of  confidence  (1.1) 

Range  max.  =  [(0.2*Smallmax)+(0.6*Medmax)+(0.3*Larg^ax)]/  sum  of  confidence  (1.1) 


Fig  2.  Output  fuzzy  sets  for  shoal  size  used  in  defuzzification.  Smallsup,  represents  the  supremum  value  of 
shoal  size  for  the  fuzzy  set  small.  Similarly,  Srnallmin  to  Smallmax  represents  the  support  of  the  fuzzy  set 
small.  Supremum  values  used  as  weights  are  obtained  from  an  extensive  literature  review  of  published 
values  observed  in  the  field. 

Results:  predicted  seasonal  dynamics  in  herring  shoal  structure 

Figure  3  displays  predicted  changes  in  shoal  size  and  packing  density  during  1 1  different  phases  of  the 
annual  life  cycle.  Note  that  the  interval  between  predictions  is  not  related  directly  to  time.  Each  prediction 


494 


is  a  snapshot  in  ecological  time,  since  the  temporal  scale  required  to  capture  the  necessary  changes  varies 
between  seasons.  The  predicted  patterns  show  good  correspondence  with  observations  on  herring  shoals. 
Overwintering  is  recognised  as  a  relatively  passive  phase  in  the  life  cycle  during  which  there  is  little 
feeding  activity  and  the  main  priorities  are  predator  avoidance  and  energy  conservation  [10].  During  this 
stage,  very  large  shoals,  or  aggregations  of  shoals,  are  commonly  found  distributed  as  layers  deep  in  the 
water  column  [11],  Increased  shoal  size  and  packing  density  are  known  to  be  typical  anti -predator  strategy 
for  shoaling  fish  [9].  Prior  to  spawning,  large  winter  aggregations  break  down  and  move  to  shallower  areas 
where  again  they  may  hold  for  a  while  forming  dense  schools  immediately  over  spawning  areas  [12], 
During  maturation  stage  2-1,  large  schools  tend  to  break  up  into  smaller,  very  dense,  mobile  schools 
(Mackinson,  in  prep).  Re-aggregation  occurs  during  spawning  with  large  shoals  forming  on  spawning  sites. 
Immediately  after,  spawned-out  fish  begin  their  migration  to  ocean  feeding  grounds  and  rapidly  disperse 
into  very  small,  low  density  shoals  that  swim  fast  and  high  in  the  water  column  [13,  14].  Density  and  size 
of  ocean  feeding  shoals  is  reduced  to  enhance  foraging  [15,  7],  Ocean  feeding  shoals  of  North  Sea  herring 
during  summer  are  observed  to  be  half  of  overwintering  shoals  of  Norwegian  spring  spawning  herring  [16]. 


10000  35  - 


Fig  3.  Quantitative  predictions  of  seasonal  changes  in  herring  shoal  size  and  packing  density.  Pre-sp  3-2, 
2-1,  1:  pre-spawning  period  divided  into  3  maturation  stages;  Spawning:  spawning;  Imm-ps:  Immediate 
post  spawned;  Off-shore:  offshore  migrating;  Ocean  1,2,3:  ocean  feeding  phase  during  3  stages  of  summer 
with  changes  in  food  and  predator  abundance  and  distribution  of  food;  On-shore:  onshore  migrating; 
Overwinter:  overwintering. 

MODEL-FREE  ESTIMATION  OF  STOCK-RECRUITMENT  RELATIONSHIPS 

Within  fisheries  science,  experience  has  shown  that  if  a  stock  is  fished  hard,  there  is  a  point  at  which 
recruitment  drops  due  to  over- fishing  [17,18],  Determining  the  relationship  between  stock  and  recruitment 
is  the  cornerstone  for  predicting  how  much  the  parent  stock  can  be  reduced  by  fishing  without  negatively 
impacting  on  future  productivity.  Applying  principles  and  techniques  of  fuzzy  approximation  theory 
[19,20]  heuristic  reasoning  can  be  used  to  define  stock-recruitment  relationships,  explicitly  characterise 
vagueness  and  uncertainty,  and  provide  a  functional  relationship  that  combines  stock  size  and  past 
recruitment  to  predict  future  recruitment.  The  approach  is  termed  model-free  estimation  or  approximation. 

Simply  stated,  fuzzy  approximation  theory  demonstrates  that  any  curve  can  be  approximated  by  covering  it 
with  patches.  Stock-recruitment  relationships  are  frequently  riddled  with  patches  (Fig  4).  The  model-free 
approach  to  stock-recruitment  consists  of  two  stages:  (1)  using  previous  data  to  create  rules  that  define  the 
fuzzy  system  relating  stock  and  past  recruitment  to  future  recruitment  (Fig  4);  (2)  predicting  recruitment 
with  the  fuzzy  system.  Embedded  within  stage  one  are  two  clustering  techniques  used  to  define  the  fuzzy 
sets  on  stock  and  recruitment.  Visual  clustering  relies  on  the  ability  to  visually  define  clusters  or  patches  in 
the  data  (Fig  4).  Iterative  clustering  defines  patches  or  clusters  according  to  an  iterative  scheme  that 
minimises  an  objective  function.  The  core  element  relies  on  an  impartial  fuzzy  cluster  analysis  routine  [21] 
that  is  modified  to  define  supremum  and  support  values  of  fuzzy  sets  on  stock  and  recruitment  [6]. 


495 


Fig  4.  Visual  clustering  on  ICES  Plaice  Vile  data. 

Supremum  values  are  defined  according  to  centre  of  identified  clusters  in  each  plane.  Dotted  ellipses 
represent  approximate  clusters  on  stock  variable,  and  solid  ellipses  are  approximate  clusters  on  recruit 
variable.  Grey  patches  identify  data  clusters  which  equate  to  rules  that  relate  stock  size  (SS)  to  recruitment 
(R).  Data  point  Y  with  value  of  SS=3 1 1 7  belongs  to  intermediate  stock  size  to  a  degree  of  0. 1 5  and  to  large 
stock  size  to  a  degree  of  0.85.  Similarly  the  value  R=3062  belongs  to  low  recruitment  to  a  degree  of  0.87 
and  medium  recruitment  to  a  degree  of  0.13. 


The  model-free  estimation  approach  is  capable  of  describing  stock-recruitment  relationships  equally  as  well 
as  traditional  analytical  methods  (Fig  5).  A  comparison  of  residuals  between  recruitment  values  predicted 
by  Ricker  or  Beverton-Holt  curves  (common  analytical  functions)  and  the  fuzzy  approximations  for  8 
different  fish  species  revealed  no  significance  difference  [6].  The  maximum  number  of  sets  required  to 
capture  the  pattern  in  data  on  any  one  of  the  axes  was  four.  In  the  majority  of  cases,  3  sets  were  sufficient. 


14000 
12000 
10000  - 


•  Real  data 
+  Ricker 

O  Fuzzy-approx  (VC) 
O  Fuzzy-approx  (1C) 


£  8000 
J)  6000  - 

4000  - 
2000  - 


++  + 


0  - - , - T - . 

0  500  1000  1500  2000  2500  3000  3500  4000 

Stock  size 


Fig  5.  Stock-recruitment  data  in  comparison  with  the  fuzzy  approximation  approach  and  traditional 
analytical  method  (Ricker).  Results  for  two  methods  of  clustering  are  displayed;  visual  clustering  (VC)  and 
iterative  clustering  (IC).  Data  taken  from  [18]  for  ICES  division  Vile  plaice. 


496 


DISCUSSION 

The  essence  of  fuzzy  logic  rests  on  the  truism  that  all  things  admit  degrees  of  vagueness.  Black  and  white 
cases  are  the  exception  in  a  world  of  gray.  Similarly,  natural  systems  do  not  conform  to  crisp  definitions. 
Use  of  heuristics  makes  it  possible  to  express  qualitative  information  in  a  series  of  rules  that  constitute  a 
knowledge  base.  Since  fuzzy  sets  are  able  to  model  words  mathematically,  application  of  fuzzy  logic 
[22,23]  takes  this  one  step  further,  allowing  the  integration  of  both  qualitative  and  quantitative  information. 

Much  of  our  current  understanding  of  fish  distribution  is  largely  qualitative  and  highly  uncertain.  Such 
information  does  not  lend  itself  well  to  mathematical  representation  and  consequently  traditional  numerical 
modelling  techniques  may  be  inappropriate  [6],  Using  input  pertaining  to  the  biotic  and  abiotic 
environmental  conditions,  the  first  example  presented  here  (Herring  Shoal  Structure  and  Distribution  expert 
system:  HSSDex)  uses  heuristic  rules  to  predict  structure,  dynamics  and  mesoscale  distribution  of  shoals  of 
migratory  adult  herring  during  different  stages  of  their  annual  life  cycle.  Comprised  of  more  than  35 
potential  inputs  and  23  outputs,  the  fuzzy  expert  system  uses  a  bottom  up’  conceptual  approach  to  link 
multiple  causative  and  inter-related  factors.  The  system  is  flexible  in  its  predictive  ability  to  forecast  shoal 
structure,  dynamics  and  mesoscale  distribution  across  different  temporal  scales.  Accuracy  of  prediction  is 
dependent  on  both  the  accuracy  of  information  captured  in  rules  and  the  realism  of  the  input  provided  by 
the  user.  Several  strategies  are  implemented  to  avoid  predictions  breaking  down  due  to  inaccurate  input  by 
the  user.  However,  even  with  these  safeguards,  the  onus  is  ultimately  on  the  user  to  provide  realism  in  the 
scenario  they  develop  when  providing  input.  Since  the  fuzzy  rules  do  not  contain  high  precision,  we  do  not 
expect  highly  precise  predictions.  More  important,  is  the  capability  to  predict  general  patterns  observed  in 
nature.  Through  the  option  of  changing  various  weights,  the  model  becomes  adaptable  and  thus  can  be 
tuned  to  particular  circumstances. 

For  many  of  the  world’s  pelagic  fish  stocks,  structure,  dynamics  and  mesoscale  distribution  of  fish  shoals 
has  considerable  importance  to  central  issues  in  fisheries  management  including;  stock  structure,  stock 
assessment,  resilience  and  harvest  control.  Through  their  incorporation  in  future  models,  resolution  of  the 
model  predictions  are  at  an  appropriate  scale  to  address  some  aspects  of  these  critical  issues. 

The  second  example,  model-free  estimation  of  stock-recruitment  relationships,  fulfils  the  criteria  that 
Hilbom  and  Mangel  [1]  define  for  useful  models:  “a  model  is  most  effective  if  it  provides  both 
understanding  (of  known  patterns)  and  prediction  (about  situations  not  yet  encountered)”.  The  approach 
offers  a  new  and  alternative  way  to  express  uncertainties  about  the  relationship  between  stock  and 
recruitment,  by  means  of  vagueness  in  the  definition  of  the  variables,  the  shape  of  the  membership 
functions  and  the  actual  clustering  and  overlap  of  the  data  among  the  different  fuzzy  sets.  Uncertainties  are 
then  expressed  in  terms  of  fuzziness  rather  than  probabilities.  A  fuzzy  system  lets  us  guess  at  the  non-linear 
world  and  yet  does  not  require  us  to  formulate  a  mathematical  model  [19].  Application  shows  that  in 
comparison  to  commonly-used  analytical  stock-recruitment  functions,  fuzzy  approximations  fit  the 
observed  data  at  least  as  well;  are  robust  with  respect  to  the  number  of  sets  required  to  describe  the  data; 
and  capture  some  important  behaviours  of  the  relationship.  The  major  benefit  is  that  there  is  no  need  to 
form  hypotheses,  or  build  a  model  and  determine  its  parameters  prior  to  fitting  the  approximation. 

In  comparison  with  more  conventional  modelling  techniques  that  rely  on  describing  relationships  with 
mathematical  functions,  fuzzy  knowledge-based  models  are  similarly  able  to  describe  continuous 
relationships  and  include  feedback  effects.  In  contrast,  they  do  not  suffer  from  the  same  constraints;  when 
knowledge  is  incomplete,  rules  can  still  be  used  to  describe  jpieces’of  relationships  without  requiring  gross 
assumptions.  Moreover,  the  transparency  of  the  models,  both  in  terms  of  their  intuitive  operation  and  the 
ability  to  access  expert  knowledge  when  questioning  reasoning,  contrast  with  the  apparent  inysteriousness’ 
of  typically  analytical  models.  In  the  field  of  natural  resource  science,  these  techniques  are  still  largely 
considered  novel  and  in  stark  contrast  to  conventional  approaches.  As  a  consequence  they  may  not 
necessarily  be  readily  accepted  and  may  suffer  from  alienation.  Despite  this  concern,  the  future  holds 
promise  for  many  applications  in  natural  resource  management.  Possible  areas  of  application  that  will 
prove  fruitful  include:  (1)  Descriptive  and  predictive  modelling,  (2)  Risk  assessment  and  decision  analysis, 
(3)  Pattern  recognition  in  data  structures,  (4)  Incorporating  local/  traditional  ecological  knowledge  with 
scientific  knowledge  for  use  in  assessment  and  management. 


497 


REFERENCES 

1.  Hilbom,  R.  and  Mangel,  M.  1997.  The  ecological  detective:  confronting  models  with  data.  Princeton 

University  Press,  Princeton,  New  Jersey.  3 1 5p. 

2.  Saila,  S.B.  1996.  A  guide  to  some  computerized  artificial  intelligence  methods,  pp.8-40.  In:  Megrey,  B. 

and  E.  Moksness  (eds.).  Computers  in  Fisheries  Research.  Chapman  and  Hall,  New  York. 

3.  Davis,  J.R.  and  Clark,  J.L.  (1989).  A  selective  bibliography  of  expert  systems  in  natural  resource 

management.  AI  Applications  in  Natural  Resource  Management,  3(3):  1-8. 

4.  Mackinson,  S.  and  Newlands,  N.  1998.  Using  local  and  scientific  knowledge  to  predict  distribution  and 

structure  of  herring  shoals.  ICES  CM  1998:J  11,  18pp. 

5.  Mackinson,  S.  In  prep.  An  adaptive  fuzzy  expert  system  for  predicting  structure,  dynamics  and 

distribution  of  herring  shoals. 

6.  Mackinson,  S.,  Vasconcellos,  M.,  and  Newlands,  N.  1999.  A  New  Approach  to  the  Analysis  of  Stock- 

Recruitment  relationships:  htodel-free  estimation’using  fuzzy  logic.  Can.  J.  Fish.  Aquat.  Sci.,  in 
press. 

7.  Mackinson,  S.,  Nottestad,  L,  Guenette,  S,  Pitcher,  T.J.,  Misund,  O.A.  and  Femo,  A..  1998.  Distribution 

and  behavioural  dynamics  of  ocean  feeding  Norwegian  spring  spawning  herring:  observations  across 
spatio-temporal  scales.  ICES  CM  1998:J12 

8.  Mackinson,  S.  and  Nottestad,  L.  1998  Combining  local  and  scientific  knowledge.  Rev.  Fish  Biol.  Fish. 

8(4):  481-490. 

9.  Pitcher,  T.J.,  and  Parrish  J.K.  1993.  Functions  of  schooling  behaviour  in  teleosts.  In:  The  Behaviour  of 

Teleost  Fishes,  2nd  ed.,  Ed.  by  T.J.  Pitcher:  Croom  Helm,  London  &  Sidney,  364-439. 

10.  Huse  I,  Ona  E  (1996)  Tilt  angle  distribution  and  swimming  speed  of  overwintering  Norwegian  spring 

spawning  herring.  ICES  J.mar.Sci  53:863-873 

11.  Mohr,  H.  1971.  Behaviour  pattern  of  different  herring  stocks  in  relation  to  ship  and  midwater  trawl.  In: 

Modem  fishing  gear  of  the  world,  3.  pp  368-371.  [ed.]  H.  Kristjonsson.  Fishing  news  books  ltd, 
Famham,  Surrey,  England. 

12.  Hay,  D.E.  1985.  Reproductive  biology  of  the  Pacific  herring  (Clupea  harengus  pallasi).  Can.  J.  Fish. 

Aquat.  Sci.  42  (Suppl  1):  111-126. 

13.  Hourston,  A.S.  and  Haegele,  C.W.  1980.  Herring  on  Canada's  Pacific  coast.  Can.  Spec.  Publ.  Fish. 

Aquat.  Sci.  48:  23  p. 

14.  Nottestad,  L.,  Aksland,  M.,  Betttestad,  A.,  Femo,  A.,  Johanessen,  A.  and  Misund,  O.A.  1996.  02  27. 

Schooling  dynamics  of  Norwegian  spring  spawning  herring  (Clupea  harengus  L.)  in  a  coastal 
spawning  area.  Sarsia  80,  277-284.  Bergen.  ISSN  036  -  4827. 

15.  Robinson,  C.M.  and  Pitcher,  T.J.  1989.  The  influence  of  hunger  and  rations  on  shoal  density, 

polarisation  and  swimming  speed  of  herring  (Clupea  harengus  L.).  J.  Fish.  Biol.,  35:459-60. 

16.  Misund,  O.A.  1990.  Sonar  observations  of  schooling  herring:  school  dimensions,  swimming  behaviour, 

and  avoidance  of  vessel  and  purse  seine.  Rapp  P.-v.  Reun.  Cons.  Int.  Explor.  Mer.  189:  135-146. 

17.  Cushing,  D.H.  1971.  The  dependence  of  recruitment  on  parent  stock  in  different  groups  of  fishes.  J. 

Cons.  Int.  Explor.  Mer.  33:  340-362. 

18.  Myers,  R.,  Bridson,  J.  and  Barrowman,  N.J.  1995.  Summary  of  worldwide  spawner  and  recruitment 

data.  Can.  Tech.  Rep.  Fish.  Aquat.  Sci.  2020:iv  +  327p. 

19.  Kosko,  B.  1993a.  Fuzzy  systems  as  universal  approximators.  IEEE  transactions  on  computers,  1993. 

Proceedings  of  the  1992  IEEE  conference  on  flizzy  systems  (FUZZ-92),  1153-1162,  San  Diego, 
March  1992. 

20.  Kosko,  B.  1993b.  Fuzzy  thinking:  the  new  science  of  fuzzy  logic.  Publ.  Hyperion,  New  York.  318p. 

21.  Bezdek,  J.C.,  1981.  Pattern  recognition  with  fuzzy  objective  function  algorithms.  Advanced 

applications  in  pattern  recognition.  New  York:  Plenum  Press.  256p. 

22.  Zadeh,  L.A  1965.  Fuzzy  sets.  Information  and  Control.  8  (3):  338-353 

23.  Zadeh,  L.A.  1973.  Outline  of  a  new  approach  to  the  analysis  of  complex  systems  and  decision 

processes.  IEEE  transactions  on  systems,  man  and  cybernetics,  Vol  SMC-3,  No.  1.  January  1973. 


498 


5 


499 


ARDx  -  A  Fuzzy  Expert  System  for  ARD  Site  Remediation 

J.V.  Balcita,  J.A.  Meech,  M.M.  Ghomshei 

Department  of  Mining  and  Mineral  Process  Engineering, 

University  of  British  Columbia, 

Vancouver,  B.C.,  Canada 


ABSTRACT 

This  paper  details  development  of  an  expert  system  using  fuzzy  techniques  to  design  remediation  techniques 
for  sites  contaminated  by  Acid-Rock-Drainage.  The  fuzzy  system  is  able  to  deal  with  missing,  inaccurate,  or 
heuristic  data  and  still  make  useful  design  decisions. 

Fuzzy  sets  are  defined  using  a  functional  relationship  between  the  degree  of  belief  in  a  certain  qualitative 
concept  and  one  or  more  quantitative  variables.  Rules  were  developed  during  interviews  with  a  chosen 
expert  in  the  field.  Using  user  input  site  data  and  characterization,  the  association  of  the  degree  of  belief  in  a 
concept  with  that  of  other  concepts  come  together  within  these  rules  to  a  produce  a  decision  or  conclusion. 

The  development  of  a  fuzzy  expert  system  for  ARD  is  a  benefit  since  it  produces  a  standardized  adaptable 
approach  to  the  problem  and  provides  quick  advice  to  a  user  looking  for  a  preliminary  but  detailed  analysis. 

The  work  done  to  this  point  includes  a  fuzzy  controller,  separate  control  modules  for  treatment  options,  cost 
analysis,  and  interactive  hypertext  documents.  The  controller  and  control  modules  work  together  in  an 
attempt  to  follow  the  decision-making  process  as  it  chooses  an  appropriate  treatment  option  for  a  site  with 
possible  or  existing  ARD.  The  hypertext  documents  are  set  up  as  user  help-resources  to  provide  system 
output  information  and  to  use  as  a  training  tool  on  treatment  options  for  ARD  or  as  a  diagnostic  tool  on  the 
possibility  of  implementing  a  treatment  system. 


BACKGROUND 

Acid  Rock  Drainage  (ARD)  is  contaminated  acidic  drainage  from  the  spontaneous  weathering  and  oxidation 
of  pyrite  and  other  sulfide  minerals  [1],  Weathering  conditions  increase  the  solubility  of  heavy  metals, 
radionuclides,  sulfate,  and  acidity;  and  reduce  the  pH  of  the  drainage.  ARD  impacts  on  watershed 
characteristics  and  creates  adverse  effects  in  the  surrounding  ecosystem. 

The  problem  exists  in  coal  as  well  as  metal  mines.  Once  exposed,  waste  rock  and/or  tailings  dams  may 
continue  to  generate  such  acidity  and  pollution  for  decades  and  perhaps,  centuries.  It  is  imperative  that 
prediction  and  prevention  be  used  as  a  primary  method  to  deal  with  and  control  ARD  at  virtually  all  mine 
sites.  However,  in  active  or  abandoned  mine  sites  where  the  problem  already  exists,  and  as  a  supplement  to 
preventative  measures  in  new  mines;  treatment  of  the  contaminated  drainage  is  necessary  [2].  This  can  add 
appreciably  to  the  on-going  operating  costs  of  a  mine. 

Expertise  in  the  field  of  ARD  is  often  controversial  as  fundamental  knowledge  is  lacking,  expertise  are 
scarce,  and  new  knowledge  is  continually  being  sought  and  applied.  Prediction  of  weather  conditions, 
surface  and  ground-water  flows,  chemistry  of  reactions  in  the  waste  piles  and  dissolution  kinetics  are  all 
fraught  with  significant  errors.  Data  to  assess  and  deal  with  ARD  problems  are  often  missing  and  so 
heuristic  judgements  play  an  important  role  in  decision-making.  A  fuzzy  system  is  able  to  handle  and 
manipulate  missing,  inaccurate,  or  heuristic  data.  [3]  A  fuzzy  expert  system  thrives  on  these  conditions. 

The  development  of  a  fuzzy  expert  system  for  ARD  is  of  benefit  since  it  produces  a  standardized  adaptable 
approach  to  the  problem,  provides  quick  advice  to  a  user  and  is  equipped  for  training  and  teaching. 


0-7803-5489-3/99/S10.00  ©1999  IEEE. 


500 


ARDX  COMPONENTS 

In  its  entirety,  the  ARDX  system  is  designed  to  handle  ARD  problems  ranging  from  prediction,  through  to 
prevention  and  monitoring  for  treatment.  The  scope  of  the  project  to  date  deals  with  decision-making  tactics 
for  prevention  and  treatment  at  a  mine  site  in  one  of  three  mine  stages:  planning,  operating,  or  closure.  The 
components  of  the  system  come  together  through  a  knowledge  base  and  inference  engine  by  using  rules  and 
fuzzy  concepts;  and  with  an  explainer  engine  providing  reasoning,  explanations  and  answers  to  user 
questions.  Figure  1  shows  the  components  of  the  ARDX  system.  The  knowledge  base  is  itself  made  up  of  a 
main  system  module  (ARDX  "main")  and  numerous  sub-modules. 


Fig.  1.  Basic  ARDX  configuration  flow-chart. 


ARDX  "main"  interacts  with  the  sub-modules  and  drives  the  system.  It  communicates  with  the  user  by 
asking  for  site  specific  data  input;  moves  through  the  appropriate  sub-modules;  assesses  whether  an 
appropriate  recommendation  has  been  found;  and  cycles  through  again  or  exits  the  system  as  required. 
Forms  pop  up  when  called  upon  for  data  input,  and  hypertext  files  are  used  for  system  output.  ARDX 
"main"  decides  the  final  recommended  treatment  options  for  the  site  by  assessing  the  cost  of  treatment  with 
the  probability  of  success  for  each  treatment  option. 

DEVELOPMENT  PROCEDURE 

Development  of  an  expert  system  requires: 

•  a  clear  definition  of  the  problem  and  domain  of  the  system 

•  knowledge  acquisition 

•  system  development  (programming  steps) 

•  testing  and  verification  of  the  system. 

DEFINING  THE  PROBLEM  AND  DOMAIN 

The  first  step  often  poses  the  most  obstacles.  In  this  case  the  domain  of  the  system  was  designed  to  include 
all  treatment  possibilities.  However,  during  knowledge  acquisition,  it  became  clear  that  the  chosen  domain 
was  extremely  large.  Rather  than  change  the  chosen  domain  of  the  system,  it  was  decided  to  focus  on  two 
aspects,  an  overall  structure  to  the  decision-making  process  (developed  through  ARDX  "main")  and  a  more 
detailed  evaluation  of  separate  treatment  options  (sub-modules).  In  this  way  a  working  system  could  be 
developed  while  separate  modules  containing  further  treatment  options  were  added,  revised  or  discarded  as 
seen  fit.  ARDX  "main"  has  become  the  seed  for  development  of  a  larger  and  more  complete  system.  Figure 
2  shows  the  basic  ARDX  flow-chart  that  has  become  instrumental  in  organizing  and  developing  the  system. 

Essentially,  the  system  examines  all  appropriate  methods  in  terms  of  their  ability  to  deal  with  the  potential 
or  existing  problem.  Once  a  particular  method  or  methods  have  been  evaluated,  the  system  looks  for 
combinations  of  methods  that  may  improve  the  solution  further.  When  these  have  been  assessed,  the  cost  of 
each  option  is  calculated  and  the  recommendations  of  the  least-costly,  most-effective  options  are  presented 
to  the  user. 


501 


USER  INTERFACE 


Information 


Planning 


Operating 


Closure 


Waste  Rock 

Tailings 

_ k. 

Mine  Workings 


WATER  COVERS 


ACTIVE  METHODS 

•  Covers 

•  Sulphide  Reduction 

•  Collect  &  Treat 

•  Bactericides 


FLOODING 


Open  Pit 

F= 


* 


MIGRATION  CONTROL 


ACTIVE  METHODS 
•  Collect  &  Treat 


PASSIVE  METHODS 


NO 


Have  methods  been  looked  at  together? 


YES 


NO 


COMPARE 


•cost 


•  probability  of  success 


[is  there  a  suitable  method  available? 


NO 


YES 


DESIGN  |4 


Select 

Collect  &  Treat 

=□ - 


Fig.  2.  Basic  ARDX  Flowsheet. 


KNOWLEDGE  ACQUISITION 

Initially  the  knowledge  acquisition  phase  included  choosing  and  interviewing  the  expert  as  well  as  extensive 
literature  searches  on  the  topic  of  ARD  treatment.  Through  interviews  with  the  expert,  the  framework  of  the 
system  was  established.  Expertise  was  taken  from  numerous  and,  sometimes,  contradictive  published  papers 
on  the  subject.  The  challenge  of  building  an  expert  system  for  ARD  would  appear  to  be  in  defining  rules  in 
a  field  where  many  decisions  are  presently  being  made  by  trial  and  error.  Case  studies  were  used  to  attempt 
to  mimic  the  actual  decision-making  process  followed  by  the  expert.  Acquiring  expertise  is  an  ongoing 
process  as  the  system  is  developed  and  expanded. 

DEVELOPMENT  OF  THE  SYSTEM 

ARDX  operates  within  the  COMDALE/X  environment.  Information  is  represented  using  keyword  triplets;  a 
method  which  assigns  an  attribute  and  value  to  each  object.  Data  in  a  keyword  triplet  can  be  stored  as 
strings,  floating  point  numbers,  dates,  logical  fuzzy  variables,  etc.  [4]. 

The  user  interface  consists  of  pop  up  FORMS,  text  boxes  and  hypertext  documents.  Through  "forms",  data 
consisting  of  drainage  characteristics  and  site  specifics  are  input  by  the  user  and  stored  as  keyword  triplets. 
The  system  will  communicate  with  the  user  in  the  event  of  any  inconsistency  in  the  input  data  via  text 
boxes.  Once  a  conclusion  has  been  reached,  the  output  is  displayed  in  a  hypertext  document. 

The  first  step  in  successfully  developing  ARDX  "main"  was  to  write  a  set  of  preliminary  rules  to  call  upon 
the  sub-modules.  Output  from  the  sub-modules  are  based  on  the  concept  of  "high",  "medium",  and/or  "low" 
assigned  to  the  probability  of  successful  mitigation  or  prevention  and  the  capital  cost  (according  to  the 
amount  of  money  available  for  the  project)  of  the  treatment  option.  Information  from  each  sub-module  can 
be  used  as  inputs  to  other  sub-modules  as  the  system  moves  through  the  modules  again  to  review  the  option 
of  using  combinations  of  the  different  treatment  systems. 


502 


Development  of  the  sub-modules  is  ongoing  as  new  ones  are  continually  being  added.  The  "ACTIVE 
METHODS"  module  is  itself  (like  ARDX  "main")  a  smaller  driving  module  that  calls  upon  the  various 
active  treatment  sub-modules.  This  secondary  smaller  driving  module  was  necessary  because  of  the  large 
number  of  options  available.  Outputs  to  each  module  (probability  of  success)  and  the  cost  (calculated 
separately),  are  compared  through  Fuzzy  Associative  Memory  (FAM)  maps  and  given  a  Degree  of  Belief 
(DoB)  in  the  treatment  option. 

The  “COVERS”  sub-module  was  developed  first.  This  module  is  part  of  the  extended  "ACTIVE 
METHODS"  sub-module  and  decides  on  the  probability  of  a  cover  to  be  used  as  an  active  treatment  option. 
Inputs  of  site  details  and  characteristics  are  placed  into  fuzzy  sets.  Through  these  fuzzy  sets,  inputs  are 
assigned  a  membership  value  in  a  set  and  a  Degree  of  Belief  (DoB)  in  the  concept  "low",  "medium",  and/or 
"high"  [4]  [5].  Figure  3  shows  the  FAM  map  elements  that  comprise  the  "COVERS"  module. 


Fig.  3.  "COVERS"  Module  Flowchart 

A  FAM  map  is  a  means  to  depict  rules  that  combine  to  determine  a  degree  of  belief  in  a  concept  from  a 
number  of  variables  [5],  The  FAM  maps  are  created  through  interviews  with  the  expert.  They  are  used 
within  the  "COVERS"  module  to  assess  input  information  and  decide  upon  an  appropriate  cover  choice. 
The  FAM  map  used  to  acquire  a  degree  of  belief  in  environmental  sensitivity  from  two  variables  that  are 
themselves  determined  through  other  FAM  maps  is  shown  in  Figure  4.  Certainty  Factors  (CF)  of  the 
concepts  “sensitive”,  “slightly  sensitive”,  and  “resistant”  need  not  add  up  to  100  as  there  may  be  an  overlap 
in  the  belief  in  each  concept  and  can  be  assigned  as  indicated  within  the  FAM  map. 


503 


ENVIRONMENTAL 
SENSITIVITY  FAM 

Socio-Environmental  Impact 

L 

M 

H 

s  =  30 

s  =  70 

s=  100 

H 

ss  =  70 

ss  =  40 

ss  =  0 

r  =  10 

r  =  0 

r  =  0 

Effluent  Mobility 

s  =  0 

s  =  20 

s  =  60 

M 

ss  =  50 

ss=  90 

ss  =  50 

r  =  60 

r=  10 

r  =  0 

s  =  0 

s  =  0 

s  =  30 

L 

ss  =  0 

ss  =  40 

ss  =  70 

r  =  100 

r  =  70 

r  =  10 

Fig.  4.  FAM  map  for  Environmental  Sensitivity  of  a  site. 

(ss  =  slightly  sensitive,  s  =  sensitive,  r  =  resistant) 

As  the  number  of  variables  necessary  to  decide  on  a  concept  increases,  the  size  and  complexity  of  the  FAM 
maps  also  increase  resulting  in  large  multi-dimensional  maps  of  the  decision-making  process.  However,  by 
using  a  two-dimensional  FAM  map  approach  as  shown  above,  this  complexity  can  be  separated  into  unique 
modules  which  are  easy  to  understand  and  develop  in  consultation  with  the  expert. 

A  separate  cost  module  has  been  developed  and  is  accessible  by  all  modules  as  necessary.  Calculated  costs 
for  an  option  can  be  used  as  inputs  to  modules.  Costs  for  each  remediation  option  are  calculated  using  unit 
prices  [6]  and  site-specific  information.  To  account  for  future  cost  variability;  the  module  updates  all 
information  according  to  the  Marshall  &  Swift  (M&S)  index  values  [7].  The  module  is  able  to  store  input 
M&S  values  for  future  reference  and  calculations. 

The  economic  evaluation  of  an  treatment  option  is  broken  down  into  capital  cost,  maintenance  and 
inspection  costs;  and  operating  costs  due  to  continued  effluent  treatment  and  sludge  disposal.  The  net 
present  value  of  all  on-going  maintenance  and  operating  costs  are  calculated  from  a  user-defined  rate  of 
return  value  (defaulted  to  3.5  if  unavailable). 

Defuzzification  is  performed  using  a  weighted-average  approach  for  the  concepts  "no"  (an  unacceptable 
treatment  option),  "no  unless"  (an  acceptable  option  at  a  high  cost,  use  if  no  other  is  available),  "ok" 
(acceptable  option),  "good"  (acceptable  and  low-cost),  and  "very  good"  (most  cost  effective)  for  each 
treatment  option.  This  becomes  the  final  degree  of  belief  (DoB)  in  each  treatment  option  recommended. 

The  output  hypertext  display  provides  a  list  of  recommended  treatment  options,  the  probability  of  success 
and  the  cost  demanded  by  each  option.  The  user  is  able  to  "click"  through  the  document  for  justification  of 
each  recommended  treatment  option  and  for  information  on  the  decision-making  process;  and  has  access  to 
justification  of  the  decision  making  process  within  the  individual  sub-modules. 


TESTING 

Testing  the  ARDX  system  is  currently  incomplete.  Actual  mining  cases  that  have  used  or  are  using 
treatment  options  similar  to  those  investigated  by  ARDX  will  be  adopted  to  test  the  system.  It  is  intended  to 
apply  both  successful  and  unsuccessful  cases  for  a  comparison  of  chosen  treatment  options,  their  success, 
and  treatment  options  decided  upon  by  ARDX. 

The  "COVERS"  sub-module  has  undergone  preliminary  testing  using  Samatosum  mine  data  [8].  The  data 
was  input  by  the  expert.  The  resulting  output  for  probable  cover  treatment  options  corresponded  to  one  of 
the  options  being  considered  for  the  mine. 


504 


CONCLUSION 

Development  of  a  Fuzzy  Expert  System  on  the  design  of  ARD  remediation  plans  has  been  successful.  The 
system  has  the  following  benefits: 

a  comprehensive,  logical  organization  of  the  design  methodologies  has  been  developed 
a  consistent  design  philosophy  can  be  generated  by  use  of  this  system 
a  training  tool  has  been  created  to  assist  in  the  transfer  of  ARD  technology  to  the  industiy 
economic  and  effective  procedures  to  use  for  a  wide  variety  of  site  problems  are  available 

Future  expansion  of  this  system  will  include  ARD  predictions  based  on  expertise  derived  from  case  studies 
of  existing  sites.  These  predictions  will  be  used  as  inputs  to  the  existing  system 


ACKOWLEDGEMENT 

The  authors  acknowledge  financial  support  from  the  National  Research  Council  through  IRAP  Grant  No: 
304695.  We  are  also  grateful  for  travel  support  from  the  Faculty  of  Graduate  Studies  and  Research  at  UBC. 


REFERENCES 

1.  R.W.  Lawrence,  A.  MacG.  Robertson,  1994.  Acid  rock  drainage  Understanding  the  problems  -  finding 
solutions.  CIM  District  6  Annual  General  Meeting,  Workshop  No.  1 . 

2.  L.  Filipek,  A.  Kirk,  W.  Schafer,  1996.  Control  Technologies  for  ARD.  Mining  Environment 
Management,  Dec,  4-8. 

3.  A.  Bowen,  1995.  Expert  systems:  Truth  and  Rumors.  Canadian  Mining  J.,  Mining  Sourcebook,  8-12. 

4.  J.A.  Meech,  C.A.  Harris,  1992.  Expert  Systems  for  Gold  Processing  Plants.  Randol  Gold  Forum, 
Vancouver,  B.C.,  31-39. 

5.  J.A.  Meech,  1995.  AI  Applications  in  the  Mining  Industry  into  the  21st  Century.  APCOM  XXV 
Conference,  93-101. 

6.  MEND,  1995.  Economic  Evaluation  of  Acid  Mine  Drainage  Technologies.  MEND  Report  5.8.1 

7.  Marshall,  Swift,  1999.  Marshall  &  Swift  Equipment  Cost  Index.  Chemical  Engineering,  106(3),  170. 

8.  M.  Ghomshei,  A.  Holmes,  E.  Denholm,  R.  Lawrence,  T.  Carriou,  1997.  Acid  Rock  Drainage  from  the 
Samatosum  Waste  Dump,  B.C.,  Canada.  Proc.  4*  Inter.  Conf.  on  Acid  Rock  Drainage.  1,  351-366. 


505 


Modeling  of  Gold  Heap  Leaching  for 
Criteria  of  Sustainability  Targets 

Roberto  C.  Villas-Boas*.  Luiz  R.  P.  de  Andrade  Lima** 

*  Center  for  Mineral  Technology,  Rua  4,  Quadra  D,  Ilha  doFundao, 
Rio  de  Janeiro,  RJ,  Brazil,  21941-590 
**  Polytechnic  School,  Federal  University  of  Bahia, 

Rua  Aristides  Novis,  2,  Salvador,  Ba,  Brazil,  40210-630 


ABSTRACT 

Sustainable  development  principles  are  forcing  proactive  approaches  from  mining  and  metallurgical  process 
and  design  engineers  to  achieve  prompt  answers  to  minimize  environmental  impact,  maximize  energy 
utilization  throughout  processing,  reducing  materials  flows  and  discards,  as  well  as  considering  social 
satisfaction  per  monetary  unit  of  products  and  processes.  A  computational  algorithm  devised  to  simulate 
the  temporal  evolution  of  gold  ore  heap  leaching  process,  in  an  attempt  to  better  understand  the 
phenomenology  behind  heap  leaching  and  provide  insights  into  the  development  of  a  sustainability  indicator 
is  described.  The  data  used  in  the  model  include  physical-chemical,  geometrical  and  operational  data,  such 
as:  leachable  metals  contents,  flow  rate  and  cyanide  concentration,  parameters  of  passivity,  ore  size 
distribution,  the  average  residence  time  of  the  solution  in  the  heap,  height,  irrigated  area  and  tonnage  of  ore 
in  the  heap.  The  shrinking-core  model,  describing  the  solid-fluid  reaction  under  diffusion  control,  was  used 
to  calculate  these  variables.  The  simulations  show  that  the  number  of  layers  has  little  effect  on  results 
indicating  a  stable  and  robust  algorithm.  The  average  residence  time  of  the  solution  in  the  heap  and  the 
effective  diffusivity  of  the  cyanide  solution  through  the  ore  particles  have  a  significant  influence  on  the 
temporal  evolution  of  gold  extraction  and  its  concentration  in  the  pregnant  solution,  so  these  parameters 
may  be  used  to  calibrate  the  model.  In  applying  the  algorithm  to  an  industrial  case,  the  results  show  that  the 
model  is  able  to  predict  the  process  performance  reasonably,  and  might  be  used  as  a  starter  for  Sustainable 
Development  decision-making  indicators,  since  it  reflects  changes  over  time  of  the  analysed  problem,  and 
the  results  are  reliable  and  reproducible. 


INTRODUCTION 

Criteria  of  sustainability  are  been  sought  in  order  to  devise  "green  engineering"  procedures  to  reach  new 
targets  imposed  by  society.  Environmental  constraints  are  receiving  the  greatest  attention  these  days,  and 
the  effectiveness  of  cyanide  leaching  of  gold  ores  in  particular,  is  under  question  [1].  These  criteria  are 
based  on  indicators  that  can  reflect  changes  over  a  period  of  time,  that  are  reliable  and  reproducible,  and, 
whenever  possible,  are  calibrated  in  the  same  terms  as  the  policy  goals  or  targets  linked  to  them  [2]. 

Heap  leaching  has  been  in  use  for  years  as  an  effective  method  to  treat  gold  ores  throughout  the  world  .  As  is 
well  known,  in  this  process,  coarse  ore  is  placed  on  an  impervious  surface,  so  prepared  that  a  small  slope 
relative  to  the  horizontal  axis  is  allowed,  to  drain-off  the  pregnant  solution.  On  top  of  the  heap,  a  leach  solution 
is  sprayed  to  progressively  percolate  down  through  the  bed  of  ore.  The  pregnant  solution  at  the  bootom  of  the 
heap  is  then  sent  to  the  recovery  step. 

Column  testing  or  experimental  small  heaps  are  utilized  to  estimate  the  leaching  characteristics  of  the  ore  body. 
However  scale-up  to  industrial  heaps  is  unadvisable  from  such  tests  due  to  poor  reproducibility  of  the  geometry 
of  the  heap  (particle  size,  heap  height,  length,  width  and  overall  slope)  and  the  hydrodynamics. 


These  difficulties,  associated  with  the  time  and  costs  required  to  prepare  the  testing  programs,  have  led  to 
development  of  phenomenological  models  to  design  and  analyze  heap  leaching  processes  since  the  60s  and 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


throughout  the  70s  and  80s,  in  particular,  for  copper  ores  and  pyrite.  These  models  are  based  on  the  material 
balance  of  the  reactants  using  a  continuity  equation  applied  to  the  heap  and  particles  using  specific  kinetic 
models.  In  order  to  solve  the  complex  system  of  partial  differential  equations  obtained,  simplifying  hypotheses 
are  introduced  without  losing  the  quality  of  the  results. 

A  brief  description  of  these  models  and  their  simplifying  hypotheses  was  presented  by  De  Andrade  Lima  et  al. 
[3],  when  reviewing  the  literature  [4, 5, 6, 7, 8, 9], 


THE  MODEL 

The  algorithm  is  developed  based  on  the  hypotheses  that:  1 .  the  heap  may  be  represented  conveniently  by  a 
simplified  geometrical  shape;  2.  the  liquid  flowthrough  the  heap  bed  is  plug  flow;  3.  the  average  residence 
time  of  the  solution  in  the  heap  does  not  vary  with  time  or  vertical  location;  4.  the  heap  presents  an 
homogeneous  grade  of  leachable  metals  and  size  distribution  and;  4.  the  ore/leaching  agent  reaction  is 
controlled  by  diffusion  of  leach  solution  through  the  large  and  weakly  porous  particles  of  the  ore. 

To  build  the  algorithm,  the  heap  is  divided  into  horizontal  layers  of  constant  area.  The  gold  recovery  from 
the  ore,  the  residual  concentration  of  cyanide  and  the  enrichment  of  the  leach  solution  can  be  calculated 
from  interactions  among  these  layers. 

For  each  heap  layer,  and  for  each  species  of  any  and  every  size  class,  the  model  equation  is  solved 
analytically  for  each  time  step.  The  flow  in  the  heap  is  considered  unidirectional  at  constant  volume 
flowrate,  but  the  species  concentrations  is  considered  to  vary  with  time.  Flow  dispersion  is  neglected. 
Figure  1  illustrates  the  schematic  slicing  representation  of  the  heaps. 

It  is  worthwhile  to  mention  that  all  simplifying  assumptions,  regarding  unidirectional  flow,  constant 
volumetric  flowrate,  neglecting  flow  dispersion,  and  neglecting  interactions  between  the  main  variables  will 
affect  the  model  precision,  although  not  necessarily  its  robustness,  as  any  model  for  environmental  decision¬ 
making  should  attempt  to  achieve. 


Fig  1.  Schematic  representation  of  the  heap  for  the  model. 

If  a  division  of  nl  layers  of  equal  thickness  is  set  in  a  heap  of  rectangular  shape,  as  show  in  Figure  1,  the 
pregnant  solution  and  the  leach  solution  that  are  flowing  through  the  various  layers  of  the  heap  are  retained 
at  a  time  equal  to  At=t  /  nl.  Since  the  average  residence  time  of  the  solution  in  each  layer,  is  constant,  the 
liquid  hold-up  of  the  heap  is  given  by  Equation  1 . 


507 


where  eHB  is  the  heap  porosity,  aHB  is  the  heap  saturation,  x  is  the  average  residence  time  of  the  solution  in 
the  bed  of  the  heap,  Q  is  the  rate  of  irrigation  in  the  heap,  H HH  is  the  average  heap  height  and  SHB  is  the 
average  heap  area. 

Due  to  the  coarse  nature  of  the  ore  in  the  heap,  diffusion  control  predominates  in  the  reaction  of 
ore/lixiviant  so  the  shrinking  core  model  may  be  used  as  in  Equation  2.  [10]: 

dv-tjim  _  3CCN ,  -  Dcn  2 

dt  ~plcT/f,2(l-a',J^-l 

where  CCN()  is  the  concentration  of  the  free  cyanide  in  the  solution  that  enters  layer  /  in  time  t,  DCn  is  the 
apparent  diffusivity  of  the  cyanide  in  the  ore  particles,  lcT  is  the  total  lixiviant  consumption,  R,  is  the 
average  radius  of  the  ore  particles  of  the  size  fraction  i,  a',Jim  is  the  recovery  of  the  metal  m  in  the  size 
fraction  i  of  layer  j  in  time  t  and  p  is  the  ore  density. 


Equation  2.  may  be  algebraically  transformed  into  Equation  3.  [3],  which  is  analytically  solved  to  give  the 
individual  metal  (m)  recoveries,  at  time  (7-Ax)  originating  from  the  particles  of  size  /,  located  in  layer  j, 
when  the  recoveries  at  time  (t-l-Ax),  the  leach  solution  cyanide  concentration  from  layer  j-1  and  the 
individual  concentrations  of  the  metal  species  are  known. 


ex'3  +b  cl'2  +c  cl'  +d  =  0 

w  I  jim  ~ut  jim  w  I  jim  '  S  jim  ,  jim  I  jim  u 


3. 


where: 


f  3  Z,Jim  27  ^ 

1  /1  — 

(  3Z?Jlm  —  27 

j  /j  — 

f  Z3  +  27  ^ 

^  t  jim  T  ^  ‘ 

and 

l  2  8  j 

Cl  jim 

4 

V  J 

atjim  ~ 

8 

v.  7 

Z i  pm  —  2Kljlm  At  3(l  a^Y3  2al_ijh 


Ore  physical  constraints,  inhibiting  the  leach  solution  from  diffusing  completely  through  the  ore  particles; 
give  rise  to  an  attenuation  factor  (0m),  as  defined  by  Equation  4.  This  factor  must  be  determined  from 
laboratory  experiments. 


a,  Jim  = 


4. 


On  the  other  hand,  knowing  the  ore  size  fractions  and  assuming  they  are  homogeneously  distributed  within 
each  of  the  nl  layers,  and  considering  further,  that  for  the  time  intervals  for  which  Equation  3.  is  solved, 
there  is  no  variation  in  grade  of  the  particles  and  the  metal  species  contents  of  each  size  fraction  is  known, 
so  the  global  recovery,  at  each  time  increment  in  each  layer  is  given  by  Equation  5. 

”f 

VL,jm='LUtJimfi  5. 

1=1 


Suppose  that  each  of  the  nl  layers  have  the  same  mass,  then  the  global  recoveries  of  each  metal  species  at 
each  time  increment  are  given  by  Equation  7. 


a H 


tm 


nl 


lo (LtJmyjm 

j= 1 


nl 

lljn, 

7=1 


6. 


The  residual  metal  content  in  each  layer  of  the  heap  can  be  calculated  from  Equation  7  for  each  instant  of 
time.  This  is  an  extremely  important  feature  for  developing  a  Sustainability  Development  Indicator. 


508 


Y  r,jm  =Yoym(1-0 (LIJm) 

where  yOJm  is  the  initial  concentration  of  the  metal  m,  contained  in  layer  j. 


7. 


The  solution  cyanide  concentration  leaving  layer  j  is  calculated  from  Equation  8.  whereas  the  concentration 
of  metal  species  that  leave  layer  j  are  obtained  from  Equation  9,  where  CCNtj  and  CCN,-/  are  respectively  the 
concentration  of  free  cyanide  in  solution  that  enters  and  leaves  layer  j  at  time  t,  CM,Jm  and  CMtJ..lm  are 
respectively  the  concentration  of  metal  m  in  solution  entering  and  leaving  layer  j  at  time  t,  oLIJm  and  aLt.,Jm 
are  the  recoveries  of  metal  m  in  layer  j  at  the  present  and  previous  times,  and  MHB  is  the  heap  tonnage. 


CCN,J+l  =  CCNtj  - 


M  HBlm 


\  S  hb  H  m 


CMlj+lm  =  Cm  tjm  + 


M 


y  S  HB  H  HB 


^  [ten,  (aL,jm  -  aL, _lyJ] 

fl  J  m= 1 
"lY"  ^ 


The  proposed  algorithm  considers  that  the  lixiviant  enters  into  the  first  layer  <j=\)  at  the  top  of  the  heap  and 
remains  there  for  a  time  A x  =  x/ n.  Later  on,  the  solution  is  transferred  to  the  next  layer  (j= 2)  and  from  this 
to  the  next  until  it  reaches  the  last  one  (j  =  nl).  During  the  time  that  the  liquid  solution  remains  in  layer  (j) 
Equation  3.  is  solved  for  each  metal  ( m )  contained  in  each  size-fraction  (/),  taking  into  account  the  residuals 
of  the  metals  (  y r)  and  the  composition  of  the  solution  (CCN  and  Cm).  Then  Equations  4.  through  9.  are 
applied  to  update  these  concentrations  and  tenors. 


SENSITIVITY  ANALYSIS  OF  THE  MODEL 

In  order  to  evaluate  the  effect  of  the  input  variables,  a  numerical  design  of  experiments  was  performed 
taking  as  the  measured  responses,  a  set  of  four  that  characterizes  the  time/evolution  curve,  as  shown  in 
Figures  2  to  4.  These  parameters  are:  t]/2  -  half  life  time  of  the  heap  (days) ,  r°  -  rate  of  recovery  at  the  start 
of  process  (%/day),  CaUjn  -  gold  concentration  at  the  first  drop  of  leach  liquor  (ppm)  and  (.cn  -  the  time 
when  the  non-reacted  cyanide  solution  begins  to  flow  into  the  heap  (day).  A  2-level  fractional  experimental 
design  consisting  of  14  variables  and  20  numerical  experiments  were  developed.  Table  1  shows  the  values 
for  the  high  (+)  and  low  (-)  levels  for  each  variable  chosen  to  account  for  the  reported  ranges  in  the 
literature.  Table  2  shows  the  experimental  matrix  and  responses  of  this  arrangement.  The  results  were 
analysed  statistically  and  by  cluster  analysis  and  principal  components  analysis  [11,12,  13]. 


Fig.  2.  Parameters  utilized  to  characterize  the  performance  of  the  process  (ty2,  r°). 


509 


Fig.  3.  Parameters  utilized  to  characterize  the  performance  of  the  process  (CA^n). 


0.45 


_  0.35- 
c 

3 

1  °-30 

c 

8 

|  0.25 

5 

2  0.20 

c 


0.10- 

0.05 

I 

0.00-1 — - 1 - ' - T - t - * - 1 - ^ 

0  20  40  60  80  100  120  140  160 

Time  (day) 


Fig.4.  Parameters  utilized  to  characterize  the  performance  of  the  process  (Iccn)* 


Table  1.  Data  for  the  sensitivity  analyses. 


VARIABLE 

LEVEL  (-) 

LEVEL  (+) 

fra2h''l 

3.5  x  10'7 

7.0  x  10'7 

P 

[g  cm'3] 

2.5 

5.0 

7au 

[ppm] 

2.0 

4.0 

Oau 

[%] 

80 

100 

7a* 

[ppm] 

250 

500 

0Aa 

[%] 

70 

100 

CIT 

[g  kg'] 

0.5 

1.0 

Hub 

[m] 

2.5 

5.0 

Mhb/Shb 

[tm-] 

5.0 

10.0 

£hb 

[%] 

45 

75 

<Jhb 

[%] 

10 

20 

Ccn 

Igr'] 

0.5 

1.0 

Q/Shb 

[;h'‘  m'3] 

6.5 

13.0 

R 

[mm] 

20 

40 

nl 

20 

50 

510 


Table  2.  Design  of  experiments  and  numerical  results. 


LEVEL  OF  THE  VARIABLES 

RESULTS  OF  SIMULATION 

TEST 

CN 

p 

Yau 

®Au 

YAg 

0Ag 

Ict  HobMhb/Sh  £hb  OhbCcnQ/Sh 

nl 

R 

V, 

v2 

v3 

v4 

tz. 

1 

+ 

+ 

- 

- 

+ 

+ 

+ 

+ 

- 

+ 

- 

+ 

- 

- 

- 

- 

+ 

+ 

- 

6.81 

33.28 

1.600 

2.644 

2 

- 

+ 

+ 

- 

- 

+ 

+ 

+ 

+ 

- 

+ 

- 

+ 

- 

- 

- 

- 

+ 

+ 

18.21 

13.39 

1.600 

1.803 

3 

+ 

- 

+ 

+ 

- 

- 

+ 

+ 

+ 

+ 

- 

+ 

+ 

- 

- 

- 

- 

+ 

4.97 

19.82 

4.000 

3.990 

4 

-I- 

+ 

- 

+ 

+ 

- 

- 

+ 

+ 

+ 

+ 

- 

+ 

- 

+ 

- 

- 

- 

- 

12.21 

19.50 

2.000 

2.644 

5 

- 

+ 

+ 

- 

+ 

+ 

- 

- 

+ 

+ 

+ 

+ 

- 

+ 

- 

+ 

- 

- 

- 

7.75 

15.86 

6.400 

3.269 

6 

- 

- 

+ 

+ 

- 

+ 

+ 

- 

- 

+ 

+ 

+ 

+ 

- 

+ 

- 

+ 

- 

- 

10.46 

33.42 

4.000 

1.262 

7 

- 

- 

' 

•f 

+ 

- 

+ 

+ 

- 

- 

+ 

+ 

+ 

+ 

- 

+ 

- 

+ 

- 

3.97 

66.78 

2.000 

1.500 

8 

- 

- 

- 

- 

+ 

- 

+ 

+ 

- 

- 

+ 

+ 

+ 

+ 

- 

+ 

- 

+ 

8.47 

75.58 

3.115 

0.721 

9 

+ 

- 

- 

- 

- 

+ 

+ 

- 

+ 

+ 

- 

- 

+ 

+ 

+ 

+ 

- 

4- 

- 

17.17 

15.75 

0.800 

0.829 

10 

- 

+ 

- 

- 

- 

- 

+ 

+ 

- 

+ 

+ 

- 

- 

+ 

+ 

+ 

+ 

- 

+ 

67.99 

8.15 

0.800 

5.000 

11 

+ 

- 

+ 

- 

- 

- 

- 

+ 

+ 

- 

+ 

+ 

- 

- 

+ 

+ 

+ 

+ 

- 

7.42 

26.97 

6.400 

3.173 

12 

- 

+ 

- 

+ 

- 

- 

- 

- 

+ 

+ 

- 

+ 

+ 

- 

- 

+ 

+ 

+ 

+ 

3.05 

78.01 

4.000 

0.661 

13 

+ 

- 

•f 

- 

+ 

- 

- 

- 

- 

+ 

+ 

- 

+ 

+ 

- 

- 

+ 

+ 

+ 

3.88 

31.73 

3.200 

1.635 

14 

+ 

+ 

- 

+ 

- 

+ 

- 

- 

- 

- 

+ 

+ 

- 

+ 

+ 

- 

- 

+ 

+ 

6.14 

65.46 

4.000 

1.471 

15 

+ 

+ 

+ 

- 

+ 

- 

+ 

- 

- 

- 

- 

+ 

+ 

- 

+ 

+ 

- 

- 

+ 

15.63 

47.68 

2.492 

0.361 

16 

+ 

+ 

+ 

+ 

- 

+ 

- 

+ 

- 

- 

- 

- 

+ 

+ 

- 

+ 

+ 

- 

- 

3.07 

130.93 

4.000 

0.736 

17 

- 

+ 

+ 

+ 

+ 

- 

+ 

- 

+ 

- 

- 

- 

- 

+ 

+ 

- 

+ 

+ 

- 

37.09 

16.63 

2.000 

0.793 

18 

- 

- 

+ 

+ 

+ 

+ 

- 

+ 

- 

- 

- 

- 

- 

+ 

+ 

- 

+ 

+ 

11.60 

22.87 

3.607 

2.404 

19 

•4* 

- 

- 

+ 

+ 

+ 

+ 

- 

+ 

- 

+ 

- 

- 

- 

- 

+ 

+ 

- 

+ 

7.31 

8.29 

1.000 

6.130 

20 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

4.90 

54.32 

1.600 

0.793 

Table  3  summarizes  the  sensitivity  analysis  for  the  proposed  model.  As  can  be  seen,  the  number  of  heap 
subdivisions  does  not  appreciably  impact  the  responses,  whereas  diffusivity  and  residence  time  (represented 
by  porosity  and  solution  saturation)  are  the  variables  that  most  affect  response  and  so,  these  are  the  ore 
parameters  that  govern  the  proposed  model.  These  variables  can  be  used  to  calibrate  the  model,  since  they 
are  not  easily  estimated  or  measured  and  yet,  they  have  a  substantial  effect  on  the  results. 


Table  3.  Results  of  the  sensitivity  analysis 


t’/j 

r 

0 

CAujn 

tcCN 

SA 

PCA-CA 

SA 

PCA-CA 

SA 

PCA-CA 

SA 

PCA-CA 

i‘CN 

X 

* 

P 

X 

Yau 

* 

X 

* 

0Au 

* 

* 

YAg 

®Ag 

lcT 

X 

* 

X 

* 

X 

* 

* 

Hhb 

* 

Mhb/Shb 

X 

* 

* 

6hB 

X 

* 

* 

<7hb 

X 

* 

X 

* 

Ccn 

X 

* 

X 

* 

X 

* 

Q/Shb 

X 

* 

X 

* 

R 

X 

* 

* 

* 

nl 

SA  :  Statistical  analysis  of  the  design  of  experiments 
PCA-CA  :  Principal  component  analysis  and  cluster  analysis  results 
x  :  Significant  in  the  statistical  analysis  at  a  confidence  level  of  90  % 


:  Significant  in  the  principal  component  analysis  and  cluster  analysis 


511 


CASE  STUDY 

The  model  was  tested  with  published  data  from  the  Fazenda  Brasileiro  Mine  operated  by  Companhia  Vale 
do  Rio  Doce  in  the  State  of  Bahia  in  Brazil  [14]  (Siqueira  et  al.,  1985).  This  mine  utilizes  oxidized  ore  with 
gold  content  of  3.5  ppm.  The  presence  of  sulfur  and  other  leachable  metals  species  are  negligible.  Since  no 
detailed  data  was  available  to  carry  out  the  simulation,  the  values  of  the  particle  size  distributions,  the 
height  of  the  heap  and  the  irrigated  area,  were  taken  as  nominal  ones.  Since  the  effective  diffusivity  of  the 
cyanide  and  the  average  residence  time  are  the  variables  that  most  affect  the  model  responses,  they  were 
used  to  perform  the  model  calibration.  Table  4  shows  the  values  for  the  variables  used  in  the  simulation. 


Table  4.  Input  data  for  simulation  runs 


VARIABLE 

Heap  P-1  A 

^CN 

[mV] 

5.0  x  10'9** 

P 

[g  cm'3] 

2.7 

lcT 

[g  kg'1] 

0.50 

Yau 

[gf] 

3.45 

0Au 

[%] 

92.2 

Hhb 

[m] 

5.0* 

Mhb 

W 

31500 

Shb 

[m2] 

2333  * 

T 

[day] 

5.84  ** 

CcN 

[g  ''I 

1.5 

Q/Shb 

[i  h'1  m'2] 

10.7 

R 

[mm] 

9.525* 

nl 

25 

♦nominal  values  **estimated  parameters 

Companhia  Vale  do  Rio  Doce  Companhia  Vale  do  Rio  Doce 

Heap  number  P-1A  Heap  number  P-1A 


Time  (day)  Time  (day) 


Fig.  5.  Comparison  between  simulated  results  and  experimental  data  for  an  industrial  heap. 

The  results  of  the  simulation  are  shown  in  Figure  5.  As  can  be  seen,  there  is  some  nonconformity  between 
the  actual  and  simulated  results  that  may  be  due  to  the  oversimplifications  mentioned  above,  i.e.,  neglecting 
of  size  distribution,  the  average  surface  area,  the  average  height  and  the  radial  and  axial  flow  dispersion. 
However,  the  simulated  curves  are  relatively  close  to  the  real  ones,  except  at  the  beginning  of  the  leaching 
process,  which  may  be  explained  by  the  limitation  of  the  flow  model  utilized  and  to  the  failure  to  consider 
the  flow  of  solution  through  the  impervious  surface  of  the  heap.  In  addition  the  actual  gold  concentration 
values  in  the  leaching  solution  are  maintained  on  the  order  of  0.5  to  1  parts  per  million,  by  cyanide  solution 
recycle  with  residual  gold. 


512 


CONCLUSIONS 

1 .  Heap  leaching  processes  for  gold  ores  may  be  reasonable  described  by  a  model  in  which  plug  flow  and 
diffusion  control  kinetics  are  considered  for  the  heap  and  for  the  ore/cyanide  reactions,  respectively. 

2.  The  analytical  solution  at  each  time  interval  of  the  shrinking  core  model  gives  robustness  to  the  model. 

3.  The  effective  diffusivity  of  cyanide  and  the  average  residence  time  are  calibrating  parameters  of  the 
model.  They  also  have  obvious  environmental  implications. 

4.  The  development  of  correlations  to  relate  the  average  residence  time  of  the  solution  in  the  heap  to  the 
operational  parameters,  as  well  as,  the  utilization  of  the  real  size  distribution,  the  average  heap  surface, 
the  average  heap  height  and  a  diffusive  flow  model  definitely  improves  the  precision  of  the  simulation 
results;  however,  little  is  expected  in  improving  its  accuracy,  due  to  its  robustness  per  se. 

5.  The  results  shown  are  good  indicators  to  represent  the  process  performance  and  might,  thus,  be  utilized 
as  a  starting  point  for  decision-making  procedures  targeting  maximization  of  environmental  capital 
represented,  in  this  case,  by  the  heap  itself,  its  chemical  species  and  its  solution  flow . 

6.  It  is  interesting  to  stress  the  readily-assessed  individual  consumption  of  cyanide,  for  each  metal  species 
of  interest,  in  a  given  time  period,  as  well  as  the  eventual  inhibition  of  leach  solution  diffusion,  thus 
resulting  in  the  possibility  to  build-up  individual  environmental  indicators. 

ACKNOWLEDGMENT 

One  of  the  authors  (L.R.P.  De  Andrade  Lima)  thanks  the  Conselho  Nacional  de  Desenvolvimento 

Cientifico  e  Tecnologico  of  the  Brazil  (CNPq)  for  the  award  of  a  scholarship  throughout  this  project. 

REFERENCES 

1 .  R.C.  Villas-Boas  ,1994.  Materials  Production  and  the  Environment.  Hydrometallurgy  94  ,  Chapman  & 
Hall,  Suffolk,  107-121. 

2.  A.  Hammond,  et  al.,  1995.  Environmental  Indicators:  A  Systematic  Approach  to  Measuring  and 
Reporting  on  Environmental  Policy  Performance  in  the  Context  of  Sustainable  Development,  World 
Resources  Institute,  May,  p.  1 1 . 

3.  L.R.  De  Andrade  Lima,  R.C.  Villas-Boas,  H.M.  Kohler,  1998,  Mathematical  Modeling  of  Gold  Ore 
Heap  Leaching,  International  Symposium  on  Gold  Recovery,  Montreal,  Canada,  CIM,  (in  press). 

4.  R.J.  Roman,  B.R.  Benner,  G.W.  Becker,  1974.  Diffusion  model  for  heap  leaching  and  its  application  to 
scale-up,  Trans.  AIME,  256,  247-256. 

5.  J.C.  Box,  R.  Yusuf,  1984.Simulation  of  heap  and  dump  leaching  process.  Proceedings  of  the 
Symposium  on  Extractive  Metallurgy,  Melbourne,  Australia,  117-124 

6.  J.C.  Box,  A.P.  Prosser,  1986.  A  general  model  for  the  reaction  of  several  minerals  and  several  reagents 
in  heap  and  dump  leaching.  Hydrometallurgy,  16,  77-92. 

7.  A.P.  Prosser,  1988.  Simulation  of  gold  heap  leaching  as  an  aid  to  ore-process  development, 

Proceedings  of  the  Precious  Metals  '89,  121-135. 

8.  D.G.  Dixon,  J.L.  Hendrix,  1993.  A  mathematical  model  for  heap  leaching  of  one  or  more  solid 
reactants  from  porous  ore  pellets,  Metallurgical  Transactions,  24B,  1087-1 102. 

9.  Sanchez-Chacon  and  Lapidus,  1997.  Model  for  heap  leaching  of  gold  ores  by  cyanidation, 

Hydrometallurgy,  44,  1-20. 

10.  G.F.  Froment,  K.B.  Bischoff,  1979.  Chemical  Reactor  Analysis  and  Design,  J.  Wiley  &  Sons,  NY,  765pp. 

11.  L.R.P.  De  Andrade  Lima,  R.C.  Villas-Boas,  H.M.  Kohler,  1995.  Analise  de  sensibilidade  de  modelos 
usando  as  tecnicas  de  cluster  analyses  e  Plackett-Burman,  in:  XVI  Encontro  Nacional  de  Tratamento  de 
Minerios  e  Hidrometalurgia,  Rio  de  Janeiro,  Brazil 

12.  J.C.  Cassa,  L.R.P.  De  Andrade  Lima,  1997.  Screening  variables  in  complex  systems:  A  comparative 
study,  Proceedings  of  the  XX  International  Mineral  Processing  Congress,  Aachen,  Germany,  1, 433-444. 

13.  De  Andrade  Lima,  L.R.P.,  Villas  Boas,  R.C.  and  Kohler,  H.M.,  1998.  Modeling  of  gold  heap  leaching 
for  criteria  of  sustainability  targets,  in:  S.A.  Atak,  G.  Onal  &  M.S.  Qelik  (Editors),  Innovations  in 
Mineral  and  Coal  Processing,  Balkema,  Rotterdam,  p.541. 

14.  L.T.  Siqueira,  R.  Madeira,  M.  Fiuza,  S.  Nakamura,  M.C.  Reinhardt,  I.  Trancosos,  1985,  Projeto  Ouro 
Bahia  -  "Fazenda  Brasileiro"  (CVRD),  In:  I  Simposio  Intemacional  do  Ouro,  Rio  de  Janeiro,  1-22. 


513 


Design  Optimisation  of  Aluminum  Recycling 
Process  using  the  Taguchi  Approach 

A.R.  Khoei,  D.T.  Gethin,  I.  Masters 

Mechanical  Engineering  Department,  University  of  Wales  Swansea,  Singleton  Park, 

Swansea,  SA2  8PP,  U.K. 


ABSTRACT 

This  paper  describes  an  experimental  investigation  into  the  process  parameter  effects  on  product  quality  in 
aluminium  recycling.  In  order  to  optimise  the  aluminium  recovery  process,  the  factors  which  have  the 
greatest  influence  have  to  be  identified  and  optimum  values  chosen.  In  the  re-melting  of  scrap,  the  ultimate 
goal  is  to  produce  clean  aluminium  while  minimising  metal  losses.  In  this  project,  a  Taguchi  method  is 
used  initially  to  plan  a  minimum  number  of  experiments.  Orthogonal  array  experiments  are  used  as  these 
allow  the  simultaneous  variation  of  several  parameters  and  the  investigation  of  interactions  between 
parameters.  Standard  L4  and  L9  orthogonal  arrays  are  employed  to  evaluate  the  effects  of  parameters  under 
changing  conditions.  Statistical  analysis,  such  as  ANOVA,  is  then  employed  to  determine  the  relationship 
between  the  processing  conditions  and  the  yield  levels.  This  investigation  has  indicated  the  parameters 
where  process  control  is  important  and  allowed  the  elimination  of  some  parameters  from  the  main 
experimental  programme. 


INTRODUCTION 

Industries  are  facing  not  only  demands  for  increased  productivity  but  also  stringent  legislation  involving 
the  release  of  by  products  into  environment.  Both  aspects  are  particularly  important  for  the  aluminium 
recycling  industries,  who  convert  aluminium  dross  and  coated  scrap  into  raw  material.  The  role  of  recycling 
in  the  aluminium  industry  cannot  be  overstated.  Recycling  is  a  critical  component  of  the  industry,  both 
from  its  contribution  to  the  environment  and  because  of  the  favourable  economic  impact  on  production. 
This  dual  benefit  is  probably  the  reason  aluminium  beverage  cans  now  account  for  the  total  beverage  can 
market  in  the  USA.  The  demand  for  used  cans  is  strong  and  virtually  guaranteed.  When  considering  solid 
waste  issues,  it's  important  to  remember  the  aluminium  can  is  solid  value,  not  solid  waste. 

The  melting  of  aluminium  dross  and  scrap  materials  to  recover  aluminium  as  metal  is  a  simple,  yet 
effective  method  of  recycling  a  valuable  material  with  a  high  inherent  energy  content.  There  are  several 
processes  to  recycle  aluminium  but  a  viable  melting  option  available  to  processors  of  aluminium  dross  and 
scrap  is  to  employ  a  'rotary  furnace',  which  is  commonly  used  in  large-scale  aluminium  recycling.  Once  the 
can  has  been  collected  from  a  collection  point,  it  is  crushed,  and  taken  to  a  recycling  plant.  The  aluminium 
is  then  loaded  into  a  hot  furnace  that  is  designed  to  remove  all  the  paint  and  dirt  from  them.  The  furnace  is 
heated  until  the  paint  and  coatings  boil  off  of  the  aluminium,  and  are  sucked  out  of  the  oven  by  powerful 
fans.  At  the  same  time  furnace  melts  the  aluminium  completely,  and  mixes  it  to  make  sure  that  it  is  of  the 
right  quality  to  be  used  again. 

Recently,  IMCO  Recycling  Ltd.  has  sought  to  introduce  a  scientific  understanding  of  the  process  with  a 
view  to  improving  productivity  and  quality,  reduce  waste  and  to  develop  process  models.  The  purpose  of 
this  research  is  to  address  aluminium  recovery  as  a  manufacturing  process  and  to  establish  the  dominant 
factors,  which  need  to  be  controlled,  to  improve  an  already  efficient  process. 

The  paper  presents  an  application  of  the  Taguchi  orthogonal  array  technique  [1]  to  the  aluminium  recycling 
process.  A  matrix  experiment  based  on  an  orthogonal  array  is  conducted  to  change  the  settings  of  the 
various  process  parameters.  The  effects  of  different  factors  are  determined  by  computing  simple  averages 
and  indicating  the  optimum  factor.  The  relative  effect  of  various  factors  is  then  calculated  using  an  analysis 
of  variance.  This  investigation  has  highlighted  key  areas  of  recycling  where  close  control  is  required  to 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


514 


improve  the  consistency  of  the  process.  A  similar  method  was  recently  presented  by  Wells  et  al.  [2]  for  re¬ 
melting  of  scrap  in  Reynolds'  Reclamation  Plants.  The  Reynolds'  Reclamation  Plants  utilises  a  reverbatory 
furnace,  but  the  method  appears  to  hold  for  both  types  of  operations. 

This  application  of  Taguchi  methods  focuses  on  how  to  cost-effectively  conduct  process  control  activities 
during  the  aluminium  recycling  process.  This  method  helps  to  diagnose  the  health  of  process,  minimise 
production  of  defects  and  achieve  an  equilibrium  between  being  'quality  conscious'  and  being  'cost 
conscious'. 


ALUMINIUM  RECYCLING  PROCESS 

The  melting  process  is  based  on  a  rotary  furnace,  which  consists  of  a  cylindrical  steel  drum  and  a  chamber. 
The  furnace  is  heated  by  a  burner  located  in  the  furnace  drum.  The  burner  system  generates  temperatures  of 
over  1700°C  inside  the  refractory  lined  drum. 

At  the  commencement  of 'heat',  flux  along  with  a  pre- weighted  amount  of  feed  material  is  charged  into  the 
furnace.  On  completion  of  charging,  the  burner  is  ignited  and  the  furnace  is  rotated.  The  furnace  fume  will 
be  collected  in  the  charging  chamber,  where  it  will  be  extracted  by  the  furnace  fume  gas  cleaning  system. 
Once  molten,  the  furnace  is  stopped  and  the  aluminium  is  decanted.  The  molten  metal  is  then  discharged 
from  the  furnace  and  directed  either  to  moulds  where  it  will  solidify,  or  into  preheated  crucibles. 


DESIGN  OF  EXPERIMENTS 

The  aim  of  experimental  investigation  is  to  more  fully  understand  the  aluminium  recycling  process  and  its 
implications  for  the  control  of  recovery.  The  orthogonal  array  technique  may  be  used  for  experimental 
design  as  it  reduces  the  number  of  experiments  required  to  fully  investigate  a  set  of  parameters  and  can  be 
used  to  indicate  interactions  between  the  parameters  investigated  [3].  It  is  especially  important  to  minimise 
time  and  costs  while  performing  experiments  using  a  production  aluminium  plant. 

At  the  first  stage  of  this  project,  a  detailed  experimental  investigation  of  the  existing  process  was 
undertaken.  Initial  brainstorming  sessions  produced  a  list  of  over  100  variables,  which  are  believed  to 
influence  the  recycling  process.  These  are  divided  into  control  parameters  and  process  stability  parameters. 
The  first  group  which  can  be  used  and  adjusted  for  the  recycling  of  aluminium,  have  a  significant  impact  on 
the  process  and  are  used  to  correct  recovery  variation  while  running.  The  second  group  are  parameters 
which  are  either  left  to  vary  while  the  process  is  running  or  are  adjusted  by  the  operator  to  obtain  optimum 
running  conditions.  Establishing  which  of  these  are  key  to  maintaining  process  stability  is  felt  to  be 
essential  to  the  long  term  success  of  the  project. 

In  order  to  investigate  the  process  control  and  stability  factors,  experiments  need  to  be  undertaken  during 
production.  The  recycling  process  is  fully  instrumented  for  the  measurement  of  parameters  including  load 
weight,  melting  temperature  and  speed  rotation  of  furnace.  An  efficient  way  to  study  the  effect  of  several 
control  factors  simultaneously  is  to  plan  matrix  experiments  using  orthogonal  arrays.  A  series  of 
experiments  using  a  Taguchi-type  design  are  performed  during  normal  production  in  IMCO  Recycling  Ltd. 
This  allowed  the  effects  of  experiments  to  be  assessed  without  a  serious  disruption  of  production. 

Due  to  the  commercially  sensitive  nature  of  the  process  being  monitored,  the  factors  are  denoted  by  A,  B 
and  C,  and  the  levels  are  related  to  standard  operating  conditions. 


Table  1.  Test  1  factors  and  their  levels 


Factors 

Levels 

1 

2 

A 

a 

a  +  2 

B 

b 

b  +  60 

C 

c 

c  +  1 

515 


Table  2.  Test  1  matrix  experiment 


Exp. 

No. 

Factors 

Recovery 

(%) 

A 

B 

C 

1 

1 

1 

1 

r  +  8.1 

mm 

1 

2 

2 

r  +  6.2 

2 

1 

2 

r  +  6.7 

ES 

2 

2 

1 

r+  7.7 

After  creating  a  Taguchi  orthogonal  array,  the  selected  process  parameters  are  varied  slightly  from  standard 
values,  which  results  in  varying  yield  levels.  Standard  L4  and  L9  orthogonal  arrays  are  considered  to 
evaluate  the  optimum  level  for  each  factor  in  the  recovery  of 'coated'  and  'class'  scrap  materials. 

Test  1:  An  14  Orthogonal  Array 

In  the  first  series  of  experiments,  the  simplest  orthogonal  array,  L4,  is  considered  in  determining  the  effect 
of  three  process  parameters  in  the  recycling  of 'coated  scrap'  materials.  For  each  parameter,  two  levels  are 
chosen  to  cover  the  experimental  region,  as  listed  in  Table  1.  The  matrix  experiment  selected  for  this  case 
is  given  in  Table  2.  It  consists  of  four  individual  experiments  corresponding  to  the  four  rows.  The  entries  in 
the  matrix  represent  the  levels  of  the  factors.  In  order  to  conduct  the  matrix  experiment  at  different  level  of 
temperatures,  the  furnace  has  been  run  by  setting  up  the  furnace  PID  at  the  indicated  temperatures,  as 
shown  in  Figure  1 . 


Fig.  1.  Variation  of  temperature  with  time  for  different  experiments  using  the  furnace  PID  control 


Table  3.  Test  1  average  of  signal-to-noise  ratio 


Factors 

Levels 

1 

2 

A 

r  +  7.1 

B 

r+7.4 * 

C 

r  +  7.9  * 

indicates  the  optimum  level 


516 


Table  4.  Test  1  ANOVA  analysis  for  recovery 


Factors 

Degrees  of 

Sum  of 

Mean 

Variance 

freedom 

squares 

square 

Ratio 

A 

1 

0.0025  * 

0.0025 

B 

1 

0.2025  * 

0.2025 

C 

1 

2.1025 

2.1025 

20.51 

Error 

0 

0 

- 

Total 

3 

2.3075 

(Error) 

(2) 

(0.205) 

(0.1025) 

indicates  sum  of  squares  added  together  to  estimate  the 
pooled  error  sum  of  squares  indicated  by  parentheses 


A  summary  static  analysis  of  recovery,  called  signal-to-noise  (S/N)  ratio,  is  employed  to  find  the  optimum 
level  [4,  5].  By  taking  the  numerical  values  of  recovery  listed  in  Table  2,  the  average  recovery  for  each 
level  of  the  three  factors  can  be  obtained  as  listed  in  Table  3.  The  optimum  level  for  each  factor  is  the  level 
that  gives  the  highest  value  of  recovery  in  the  experimental  region.  Thus,  high  level  of  factor  A  along  with 
low  level  of  factors  B  and  C  will  make  the  highest  levels  of  recovery. 

The  relative  magnitude  of  the  effect  of  different  factors  can  be  obtained  by  the  decomposition  of  variance, 
called  analysis  of  variance  (ANOVA)  [4],  An  ANOVA  analysis  for  estimating  the  error  variance  for  the 
factor  effects  and  variance  of  the  prediction  error  is  given  in  Table  4.  In  this  Table,  the  number  of 
independent  parameters  associated  with  a  matrix  experiment  or  a  factor  is  called  'degrees  of  freedom'.  The 
matrix  experiment  with  four  rows  has  three  degrees  of  freedom  associated  with  the  total  sum  of  square. 
Each  factor  with  two  levels  has  only  one  independent  parameter  and  hence,  one  degree  of  freedom. 
Therefore,  the  degrees  of  freedom  for  error  comes  out  to  be  zero. 

The  sum  of  squares  values  due  to  various  factors,  tabulated  in  Table  4,  are  a  measure  of  the  relative 
importance  of  the  factors  in  changing  the  values  of  recovery.  It  can  be  seen  that  factor  C  explains  a  major 
portion  of  the  total  variation  of  recovery.  In  fact,  it  is  responsible  for  (2. 1025/2.3075)  x  100  =  91.1  percent 
of  the  variation  of  recovery.  Factors  B  and  A  together  are  responsible  for  only  a  small  portion,  namely  8.8 
and  0.1  percent,  of  the  variation  in  recovery.  It  should  be  noted  that  because  the  error  term  has  no  degrees 
of  freedom  associated  with  it,  the  sum  of  squares  contribution  to  this  term  is  zero.  In  Table  4,  the  mean 
square  for  a  factor  is  computed  by  dividing  the  sum  of  squares  by  the  degrees  of  freedom. 

An  estimation  of  the  sum  of  squares  for  the  error  term  can  be  obtained  by  pooling  the  sum  of  squares 
corresponding  to  the  factors  having  the  lowest  mean  square  [4],  i.e.  factors  A  and  B.  These  two  factors 
together  account  for  two  degrees  of  freedom  and  the  sum  of  their  sum  of  squares  of  0.205,  as  indicated  by 
parentheses.  Hence,  the  error  variance  is  0.1025.  The  variance  ratio  can  be  found  using  the  ratio  of  the 
mean  square  due  to  a  factor  and  the  error  mean  square.  The  large  value  of  the  variance  ratio,  20.51,  means 
the  effect  of  factor  C  is  quite  large  compared  to  the  error  variance. 


Table  5:  Test  2  factors  and  their  levels 


Factors 

Levels 

1 

2 

3 

A 

a -1.5 

a 

a  +  1.5 

B 

b 

b  +  20 

b  +  40 

C 

c  -  1 

c 

c  +  1 

517 


Table  6:  Test  2  matrix  experiment 


Exp. 

No. 

Factors  * 

Recovery 

(%) 

A 

B 

C 

e 

1 

1 

1 

1 

1 

r  +  5.3 

2 

1 

2 

2 

2 

r  +  2.9 

3 

1 

3 

3 

3 

r  +  9.2 

4 

2 

1 

2 

3 

r  +  6.1 

5 

2 

2 

3 

1 

r+  0.4 

6 

2 

3 

1 

2 

r+1.8 

7 

3 

1 

3 

2 

r  +  5.2 

8 

3 

2 

1 

3 

r  +  4.5 

9 

3 

3 

2 

1 

r  +  6.3 

empty  column  is  denoted  by  e. 


Test  2:  An  L9  Orthogonal  Array 

In  the  second  series  of  experiments,  an  L9  orthogonal  array  is  employed  in  recycling  of  'class  scrap' 
materials.  As  in  the  last  test,  three  control  factors  A,  B  and  C  are  selected  for  process  optimisation.  For  each 
factor,  three  levels  are  chosen  to  cover  the  wide  region  of  variation.  These  factors  and  their  alternate  levels 
are  listed  in  Table  5.  The  L9  orthogonal  array  is  given  in  Table  6,  which  consists  of  nine  experiments 
corresponding  to  the  nine  rows  and  four  columns.  In  this  matrix,  the  chosen  three  factors  are  assigned  to 
columns  1  through  3  and  column  4  is  arbitrarily  designed  as  an  empty  column. 

After  conducting  the  experimenter's  log  given  in  Table  6,  the  next  step  in  data  analysis  is  to  estimate  the 
effect  of  each  control  factor  and  to  perform  analysis  of  variance  (ANOVA),  as  described  in  the  last  test. 
The  factor  effects  for  recovery  of 'class  scrap'  materials  and  the  respective  ANOVA  analysis  are  given  in 
Table  7.  The  optimum  level  for  each  factor  is  obtained  by  the  average  recovery  for  each  level.  From  these 
observations,  the  optimum  setting  of  factor  A  is  the  lowest  level.  However,  for  factor  B  the  recovery 
improves  as  this  factor  increases  tends  to  higher  level.  Also,  it  suggests  that  middle  level  of  factor  C  is  most 
appropriate  for  higher  recovery. 

In  the  analysis  of  variance  (ANOVA),  the  degrees  of  freedom  are  considered  eight  for  a  matrix  experiment 
with  9  rows  and  two  for  each  factor  with  3  levels.  Thus,  the  degrees  of  freedom  for  the  error  is  two.  In 
order  to  estimate  the  relative  importance  of  the  factors,  the  sum  of  squares  value  due  to  different  factors  is 
obtained.  It  can  be  seen  that  factors  B  and  A,  with  33.7  and  27.9  %  respectively,  are  responsible  for  a  large 
portion  of  the  total  variation  of  recovery  in  this  type  of  material.  However,  an  estimation  of  variance  ratio 
explains  that  the  effect  of  these  two  factors  is  not  quite  as  large  as  error  variance.  It  should  be  noted  that  the 
error  variance,  calculated  in  Table  7,  is  obtained  by  pooling  the  sum  of  squares  corresponding  to  the  factor 
C  talcing  the  lowest  mean  square.  Thus,  this  factor  along  with  the  error  account  for  4  degrees  of  freedom, 
the  total  of  their  sum  of  squares  of  21.33  and  the  error  variance  of  5.33,  as  shown  by  parentheses. 


Table  7.  Test  2,  average  of  S/N  ratio  and  ANOVA  analysis  for  recovery 


Factors 

Levels 

Degrees  of 

Sum  of 

Mean 

Variance 

1 

2 

3 

freedom 

squares 

square 

Ratio 

A 

r  +  5.8  5 

r  +  2.8 

2 

15.5 

7.75 

mm 

B 

r  +  5.5 

r  +  2.6 

2 

18.7 

9.35 

C 

r  +  3.9 

r  +  5.1  5 

2 

2.5* 

1.25 

■HI 

Error 

2 

18.8* 

9.4 

Total 

8 

55.5 

(Error) 

(4) 

(21.3) 

(5.33) 

5  indicates  the  optimum  level 

indicates  sum  of  squares  added  together  to  form  the  pooled  error  sum  of  squares  shown  by  parentheses 


518 


CONCLUSION 

In  this  paper  a  Robust  Design  is  presented  for  improving  productivity  during  aluminium  recycling  process 
and  development  so  that  high-quality  products  can  be  produced  at  low  cost.  A  Taguchi  method  was  used  to 
plan  a  minimum  number  of  experiments.  In  this  method  a  signal-to-noise  ratio  which  measures  quality  and 
an  orthogonal  array  which  is  used  to  study  design  parameters  simultaneously  were  employed.  An  ANOVA 
analysis  was  then  applied  to  evaluate  the  relative  importance  of  the  effect  of  various  factors.  A  series  of 
orthogonal  array  experiments  was  carried  out  on  IMCO  Recycling  Ltd.  with  a  minimal  interruption.  The 
trials  on  'coated  scrap'  material  using  an  L4  orthogonal  array  indicated  that  high  level  of  factor  A  along  with 
low  level  of  factors  B  and  C  give  the  highest  levels  of  recovery.  It  was  also  observed  that  factor  C  has  the 
largest  contribution  to  the  total  sum  of  squares  and  correspondingly  has  a  major  influence  on  the  total 
variation  of  recovery.  The  experimental  on  'class  scrap'  material  employing  an  L9  orthogonal  array 
suggested  that  low  level  of  factor  A,  high  level  of  factor  B  and  middle  level  of  factor  C  make  the  best  levels 
of  recovery,  in  which  factors  B  and  A  are  more  effective  factors  in  the  total  variation  of  recovery.  These 
results  indicated  which  parameters  in  the  process  have  a  large  impact  on  the  product  quality  during 
production.  In  later  work,  we  will  show  how  the  interactions  between  parameters  can  be  effectively  used 
for  process  optimisation  of  aluminium  recycling. 


ACKNOWLEDGMENTS 

The  authors  gratefully  acknowledge  the  support  of  the  EPSRC  and  IMCO  Recycling  Ltd.,  particularly 
IMCO  Recycling  Ltd.  for  assistance  in  the  running  of  the  experimental  program. 


REFERENCES 

L  G.  Taguchi,  1988.  Introduction  to  Quality  Engineering:  Designing  Quality  into  Products  and  Processes, 
Asian  Productivity  Organisation,  Japan. 

2.  M.F.J.  Bohan,  T.C.  Claypole,  D.T.  Gethin,  1995.  'The  application  of  Taguchi  methods  to  the  study  of 
ink  transfer  in  heat  set  web  offset  printing',  47th  Annual  TAGA  Tech.  Conf.,  Orlando,  Florida. 

3.  P.A.  Wells,  R.E.  Andreas,  T.M.  Fox,  1995.  'Metal  recovery  enhancement  using  Taguchi  style 
experimentation',  In:  P.B.  Queneau  and  R.D.  Peterson  (eds.),  3rd  Int.  Symp.  Recycling  of  Metals  and 
Engineered  Materials,  Alabama,  269-281. 

4.  M.S.  Phadke,  1989.  Quality  Engineering  using  Robust  Design,  Prentice  Hall,  New  Jersey. 

5.  R.K.  Roy,  1990.  A  Primer  on  the  Taguchi  method,  Society  of  Manufacturing  Engineers,  Dearborn, 
Michigan,  USA 


519 


Towards  a  Better  Understanding  of  Fuzzy  Sets 
Applied  to  Environmental  Science 

Mory  M.  Ghomshei  and  John  A.  Meech 

University  of  British  Columbia,  Department  of  Mining  and  Mineral  Process  Engineering, 
6350  Stores  Road,  Vancouver,  B.C.  V6T  1Z4,  Canada 
Email:  ghomshei@mining.ubc.ca  iam@mining.ubc.ca 


ABSTRACT 

The  fuzzy  set  of  a  concept  is  defined  by  the  distribution  function  of  the  degree  of  belief  (DoB)  in  a  certain 
qualitative  parameter  (the  concept),  over  a  range  of  variation  in  a  quantitative  or  less-qualitative  parameter 
(the  scale).  The  concept  may  be  expressed  with  different  scaling  parameters  and  each  parameter  on  its  own, 
is  not  necessarily  unique.  Therefore  the  form  of  a  fuzzy  set  depends  on  the  choice  of  scale.  In  the 
environmental  field,  regulators,  health  authorities,  epidemiologists,  politicians,  environmentalists, 
engineers  and  the  general  public  often  have  different  definitions  for  a  concept  such  as  contamination.  The 
only  term  which  is  more  or  less  unequivocally  understood  by  all  interested  groups  is  the  final  "risk"  (often 
associated  with  dollar  value).  Proper  definition  and  scaling  of  fuzzy  sets  can  provide  a  common  language 
through  which  experts  from  different  disciplines  can  communicate  through  the  entire  process  of  risk 
assessment.  The  uncertainty  of  the  input  information  is  propagated  (but  neither  magnified  nor  dampened)  in 
a  fuzzy  approach  and  yet  the  system  output  will  remain  fuzzy  which  can  then  be  translated  into  either 
quantitative  risk  values  or  qualitative  linguistic  terms. 


THE  PROBLEM 

A  fuzzy  set  is  defined  by  the  distribution  function  of  the  Degree  Of  Belief  (DoB),  in  a  certain  qualitative 
concept,  over  the  range  of  variation  in  a  quantitative  (or  less  qualitative)  parameter  which  is  closely  related 
to  that  concept.  For  example,  consider  the  concept  "old",  which  is  a  qualitative  concept  closely  related  to 
variation  in  a  more  quantitative  parameter  "age"  [1].  The  distribution  of  DoB  or  the  membership  of  a 
certain  age  in  the  concept  "old"  is  a  fuzzy  set.  It  should  be  noted  that  the  parameter  used  to  scale  the 
qualitative  concept  "old"  is  not  necessarily  unique.  One  can  scale  the  concept  "old"  with  age,  health, 
beauty,  hormonal  activity,  feelings  of  being  old,  and  so  on.  Of  course,  the  form  of  the  fuzzy  set  would  be 
different  depending  on  the  choice  of  scaling  parameter. 

The  DoB  distribution  over  the  scaling  parameter  (i.e.,  the  fuzzy  set)  depends  as  well  on  the  definition  of  the 
fuzzy  concept.  For  example,  in  considering  "old",  the  form  of  the  related  fuzzy  set  depends  on  how  this 
concept  is  defined  (i.e.,  by  functionality,  sociological  perception  or  legal  definition).  The  legal  definition  is 
usually  linear  with  threshold  and,  in  some  contexts,  may  be  a  crisp  relationship.  Sociological  perception  is 
more  sophisticated,  possibly  a  Laplace-Gaussian  distribution,  while  functionality  is  possibly  the  most 
sophisticated,  with  an  intermediate  period  of  relaxation  or  maturity  (see  Fig.  1). 

This  fuzziness  in  the  very  definition  of  a  qualitative  concept  is  often  at  the  origin  of  miscommunication 
between  different  groups  of  experts  collaborating  with  or  confronting  each  other  on  a  problem.  In  the  area 
of  environmental  studies,  regulators,  health  authorities,  epidemiologists,  politicians,  environementalists, 
engineers  and  the  general  public  often  have  different  definitions  for  (or  understanding  of)  a  concept  such  as 
contamination.  Fig.  2  is  an  example  of  how  different  understandings  about  e  meaning  of  contamination  can 
lead  to  different  fuzzy  sets  present  in  the  minds  of  different  people.  An  expert  system  which  tries  to 
communicate  with  different  groups  of  experts  and/or  users  should  therefore  accommodate  the  subtleties 
existing  in  the  definition  or  understanding  of  a  concept.  Alternatively,  one  can  start  with  simple  or 
elementary  fuzzy  sets  related  to  certain  primary  causes  and  then  adapt  the  original  set  to  obtain  fuzzy  sets 
to  represent  more  complex  effects  as  understanding  improves  and  the  definition  becomes  more  focused. 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


520 


X  =  Age  (normalized  to  100) 


(b)  "OLD"  (Perceived) 


(c)  "OLD"  ( Legal) 


CAUSE  AND  EFFECT 

Environmental  concepts  are  often  scaled  against  different  parameters  to  relate  cause  and  effect.  As  an 
example,  consider  pollution  from  two  carcinogenic  agents  such  as  radon  gas  and  arsenic.  The  degree  of 
pollution  can  be  scaled  against: 

1.  The  concentration  of  pollutant  1  and  2  in  the  medium  of  concern,  to  define  Cause  1  and  Cause  2; 

2.  The  individual  risk  (e.g.,  cancer)  from  different  causes  to  define  Effect  1-1  and  Effect  1-2; 

3.  The  cancer  rate  in  the  population  to  define  the  Population  Risk  (Effect  2)  from  Effects  1-1  and  1-2. 

4.  The  health  costs  which  derive  from  Effect  2  to  derive  Effect  3; 

5.  The  overall  socio-economic  impact  from  Effect3  and  other  competing  factors  to  derive  Effect  4. 


521 


The  first  scale  is  the  most  independent  and  unbiased  one.  It  is  however  meaningless  if  it  is  not  coupled  with 
epidemiological  data  (the  second  scale)  and  human  geography  (the  third  scale). 


X  =  Contaminant  Concentration 


(b)  Contamination  (regulated  by  limits) 


X  =  Contaminant  Concentration 

Fig.  2.  Scaling  the  concept  of  "contamination"  against  contaminant  concentration. 

If  pollution  is  scaled  against  "individual  cancer  risk"  (Effect  1),  the  distribution  function  of  the  DOB 
(fuzzy  set)  is  likely  to  be  linear  (or,  at  least,  assumed  to  be  linear  over  the  range  of  concern).  Both  radon 
and  arsenic  pollution  would  therefore  have  the  same  "metric"  (i.e.,  cancer  risk),  allowing  their  arithmetic 
combination  to  yield  an  overall  population  risk  for  cancer  (Effect  2).  If  health  costs  are  used  for  scaling,  the 
fuzzy  sets  should  have  different  "metrics",  because  the  cancer  related  to  arsenic  is  skin  cancer  (95% 
benign)  [2],  while  the  cancer  caused  by  radon  is  lung-cancer.  One  should  therefore  be  cautious  about 
naively  combining  intermediate  effects  issuing  from  different  causes.  Instead,  it  would  be  prudent  to  treat 


522 


each  independent  cause  separately  and  only  combine  their  effects  when  they  have  been  scaled  against  an 
unambiguously  identical  "metric"  which,  in  this  case,  is  health  dollars. 


FUZZY  METHOD  VERSUS  WORST  AND  BEST 

Application  of  fuzzy  logic  is  especially  useful  when  data  are  scarce.  Consider  the  case  of  land  contaminated 
by  different  agents  (e.g.,  arsenic,  lead  and/or  mercury).  A  risk  assessment  study  of  such  land  needs 
extensive  data  which  can  be  placed  into  different  categories  as  follow: 

1-  concentration  of  the  contaminants  (chemical  agents)  in  the  medium  (soil  and  water) 

2-  toxicity  of  each  agent  (individual  risk) 

3-  land  use 

4-  pathway  of  each  agent  for  each  particular  land  use 

5-  health  and  economic  risks 

A  risk  assessment  is  then  conducted  based  on  rigorous  statistical  treatment  and  modeling  of  the  data.  It 
must  be  noted  that  in  most  cases  the  data  are  actually  unreliable.  Highly  sophisticated  numerical  models 
will  therefore  have  to  be  boiled  down  to  very  rough  outputs  known  as  worse-case  and  best-case  scenarios. 
The  dilemma,  then,  is  where  does  the  reality  stand  between  the  two  extreme  scenarios?  Depending  on  the 
spread  in  the  data,  the  best-case  scenario  may  indicate  no  risk  at  all,  while  the  worse-case  scenario  may 
suggest  extreme  hazards,  meaning  the  land  is  practically  unusable.  The  polarity  of  such  output  leaves  room 
for  great  confusion  on  the  part  of  regulators,  investors  and/or  adjudicators.  In  these  situations,  fuzzy  logic 
can  be  very  helpful  to  introduce  a  continuous  risk  function  [3]  between  the  best-case  and  worse-case 
scenarios.  Such  a  function  is  especially  useful  to  those  who  seek  or  provide  environmental  risk  insurance. 

In  fuzzy  treatment  of  risk  assessment,  the  input  data  are  translated  into  fuzzy  numbers  and/or  sets 
applicable  to  fuzzy  arithmetic  [4].  One  can  express  environmental  input  as  a  variety  of  fuzzy  sets.  The 
choice  of  words  or  concepts  depends  on  how  each  fuzzy  set  absorbs  the  inaccuracies  and  biases  in  the  data 
and  still  remain  meaningful.  It  is  therefore  necessary  to  understand  the  cause  and  effect  relationships  and  to 
begin  the  process  of  fuzzification  with  information  about  the  primary  causes  and  to  let  the  uncertainty 
propagate  through  to  more  complex  effects.  The  input  data  into  such  analyses  can  be  expressed  as  fuzzy 
functions  in  which  the  magnitude  of  each  parameter  is  coupled  with  a  specific  membership  value  or  Degree 
of  Belief  [5], 


ANATOMY  OF  RISK  (A  REAL  EXAMPLE) 

The  average  concentration  of  a  contaminant  (e.g.,  arsenic  in  soil)  can  be  expressed  by  a  fuzzy  function  to 
which  existing  data  points  are  statistically  fitted.  Suppose  we  have  9  data  points  for  the  concentration  of 
arsenic  at  site  X  (  Table  1  ).  The  simplest  fuzzy  set  would  be  a  curve  in  which  the  normalized  frequency  of 
each  data  value  is  expressed  in  terms  of  the  DoB  as  in  Fig.  3. 

Table  1.  Arsenic  measurements  at  site  X. 

sample  arsenic 
(ppm) 

1  2.5 

2  5 

3  1.3 

4  4 

5  3.5 

6  2.2 

7  3 

8  1.7 

9  3 


523 


Fig.  3.  Probability  distribution  of  arsenic  in  soil. 


The  toxicities  of  the  agents  are  themselves  fuzzy  numbers  or  sets  and  are  often  defined  in  relation  to  the 
affected  biota  or  human  population.  Individual  risk  is  often  expressed  as  a  range  between  upper  and  lower 
values  (thresholds).  For  the  purposes  of  our  example,  consider  the  case  of  arsenic  in  which  toxicity  can  be 
expressed  in  terms  of  an  individual  risk.  Each  arsenic  level  is  then  associated  with  the  probability  of  cancer 
in  an  individual  exposed  to  that  concentration  in  some  medium.  This  is  also  a  fuzzy  set  which  may  be  linear 
over  the  range  of  interest. 

In  order  to  relate  the  two  fuzzy  sets  (i.e.,  soil  contamination  and  individual  risk),  one  needs  a  third  function 
to  correlate  the  arsenic  in  soil  with  the  arsenic  ingested  by  an  individual  exposed  through  a  particular 
pathway.  This  correlation  is  based  either  on  epidemiological  data  or  calculated  through  pathway  models. 
Reliable  epidemiological  data  relating  individual  risk  (e.g.,  cancer)  to  the  concentration  of  contaminant  in 
soil  is  scarce.  The  individual  risk  is  therefore  related  to  sources  (i.e.,  primary  causes)  of  contamination 
through  a  pathway  model  which  defines  the  process  through  which  a  certain  contaminant  finds  its  way  to 
an  individual  at  risk  [2],  There  are  usually  several  pathways  associated  with  each  contaminant.  For 
example,  for  a  contaminant  in  soil,  it  may  be  ingested  (through  hand-to-mouth)  by  children  playing  on  the 
ground  (pathway  1),  or  it  may  be  ingested  through  inhalation  of  dust  arising  from  the  contaminated  soil 
(pathway  2),  or  by  being  leached  into  the  hydrological  system  under  low-pH  conditions  and  then  ingested 
through  drinking  water  (pathway  3),  to  mention  a  few.  In  a  fuzzy  approach  to  risk  assessment,  pathways 
can  be  modeled  by  fuzzy  functions  (instead  of  assuming  maximum  and  minimum  values).  Individual  risks 
for  each  pathway  model  can  then  be  multiplied  by  the  affected  population  (itself,  a  fuzzy  number)  to 
calculate  the  overall  population  risk  and  associated  costs. 

Let!;  go  back  to  our  case  of  arsenic  at  site  X  and  consider  the  two  most  important  pathways: 

1 .  hand-to-mouth  (ingestion  directly  from  soil) 

2.  dust  inhalation  (ingestion  through  air) 

It  is  evident  that  the  affected  population  is  not  necessarily  the  same  for  the  two  pathways.  The  individual 
risk  associated  with  each  pathway  should  therefore  be  normalized  to  the  size  of  the  affected  population 
before  combining  the  effects.  Flere  we  examine  the  first  pathway  in  detail.  Fig.  3  defines  the  original  cause 
(i.e.,  arsenic  concentration)  using  a  fuzzy  set.  The  hand-to-mouth  pathway  [6]  implies  a  linear  correlation 
between  arsenic  in  the  soil  (cause)  and  ingested  arsenic.  The  slope  of  the  correlation  has  been  determined 
from  epidemiological  data  and  models  [7,8],  (see  Fig.  4). 


524 


Fig.  4.  Arsenic  ingested  from  contaminated  soil  through  hand-to-mouth  pathway. 


Fig.  5.  Probability  of  arsenic  ingestion  through  hand-to-mouth  pathway. 

Belief  in  individual  risk  can  therefore  be  linearly  linked  to  the  original  cause  [9].  So  the  fuzzy  set  of  the 
effect  has  the  same  shape  as  that  of  the  cause.  (Fig.  5, 6).  Only  the  "metrics"  are  different. 


Fig.  6.  Degree  of  Belief  that  the  individual  risk  of  cancer  is  0.00005. 

The  population  exposed  to  the  "hand-to-mouth"  pathway  are  children,  whose  numbers  depend  on  the  total 
residential  units  and  the  number  of  children  in  each  unit.  Based  on  interviews  with  urban  planners,  the 
affected  population  can  be  assessed  in  fuzzy  terms  (  see  Fig.  7).  Note  that  conventional  risk  assessment 
typically  uses  the  maximum  population  which  leads  to  overestimation  of  the  total  population  risk. 


525 


0  100  200  300  400 

Population  exposed  to  hand  to  mouth 
ingestion 


Fig.  7.  Degree  of  Belief  that  a  particular  population  size  is 
exposed  to  arsenic  by  "hand-to-mouth"  ingestion. 

The  total  population  risk  is  calculated  from  multiplication  of  the  individual  risk  and  the  exposed  population 
size.  Considering  that  both  components  are  expressed  as  fuzzy  sets,  the  product  is  defined  as  a  fuzzy  set 
with  3  dimensions  [4],  individual  risk,  population  size  and  DoB.  Fig.  8  shows  the  results  of  the  fuzzy 
multiplication  of  individual  risk  by  exposed  population  size.  The  outer  locus  of  points  basically  describes 
the  range  of  possibilities  that  a  particular  total  population  risk  is  measured.  This  envelope  can  be  considered 
as  the  location  of  interest  of  any  specific  distribution  of  population  risk  measurements. 


Fig.  8.  The  population  risk  distribution  for  pathway  1  will  be  located 

within  the  envelope  defined  by  the  uppermost  locus  of  points  (heavy  line). 

The  effect  of  pathway2  was  assessed  using  the  same  process.  Note  that  the  affected  population  (adults  and 
children)  is  significantly  larger  than  that  of  pathway  1,  but  countering  this  is  the  fact  that  total  ingestion 
through  air  is  significantly  lower  that  hand-to-mouth.  The  final  risks  of  the  two  pathways  are  therefore 
comparable  in  values  (Fig.  9). 


Fig.  8.  The  population  risk  distribution  for  pathway  2  will  be  located 

within  the  envelope  defined  by  the  uppermost  locus  of  points  (heavy  line). 

The  final  population  risk  is  then  calculated  by  simple  addition  of  the  two  fuzzy  sets  (Fig.  1 0). 


526 


Risk  for  pathways  1&2  (cancers) 


Fig.  10.  Total  cancer  risk  distribution  associated  with  arsenic  exposure  will  be  located 
within  the  envelope  defined  by  the  uppermost  locus  of  points  (heavy  line). 

The  final  population  risk  related  to  all  pathways  can  therefore  be  expressed  in  a  fuzzy  set  rather  than  in  two 
extreme  values  (worse  and  best)  as  given  in  conventional  risk  assessment.  In  cases  of  knowledge  about 
different  types  of  health  risks,  the  population  risk  for  each  one  should  be  converted  to  health  costs  before 
combining  the  effect  of  all  health  risks. 


DISCUSSION 

Definitions  and  data  related  to  environmental  concepts  are  fuzzy  in  nature.  Risk  assessment  is  a  common 
methodology  to  quantify  (or  qualify)  the  concern.  In  a  fuzzy  logic  approach  to  risk  assessment,  instead  of 
adopting  the  worse  case  scenario,  all  scenarios  are  given  a  fair  weight  by  associating  a  degree  of  belief 
(DOB)  to  each  possible  scenario.  All  scenarios  are  therefore  taken  into  account  and  uncertainties  are 
propagated  through  the  system,  making  the  output  less  biased.  Fuzzy  presentation  of  the  input  and  output 
information  can  provide  a  simple,  yet  intelligent,  medium  of  communication. 

Contrary  to  its  name,  application  of  fuzzy  sets  and  fuzzy  arithmetic  to  environmental  studies  can  help 
reduce  the  fuzziness  in  both  output  results  and  the  investigation  procedures.  It  requires  defining  risk-based 
fuzzy  sets  to  replace  a  variety  of  different  fuzzy  sets  present  (but  unexpressed)  in  the  minds  of  the  different 
interested  parties  as  shown  in  Figure  2.  We  are  continuing  our  work  to  examine  ways  to  combine  the  fuzzy 
analyses  depicted  in  Figure  2  with  the  fuzzy  arithmetic  presented  in  the  example  application. 

We  believe  that  a  fuzzy  expression  of  risk  can  provide  a  harmonized  baseline  for  regulations  and  permits. 
Causal  association  of  the  risks,  often  reflected  in  the  shape  of  the  fuzziness  involved  in  the  process,  can 
facilitate  understanding  of  the  system,  which  otherwise  would  simply  be  a  black  box. 

REFERENCES 

1.  Harris,  C.A.,  and  J.  Meech,  J.A.,  1987.  "Fuzzy  Logic:  A  Potential  Control  Technique  for  Mineral 
Processing";  CIM  Bulletin,  80(905),  51-59. 

2.  W.H.O.  (World  Health  Organization),  1981.  Environmental  Health  Criteria  18  -  ARSENIC. 
Environmental  Health  Criteria  Series,  Geneva,  41-123. 

3.  Kosco,  B.,  1994.  Fuzzy  systems  as  universal  approximators.  IEEE  Trans.  Computers,  43(11),  1329-1333. 

4.  Kaufmann,  A.  and  Gupta,  M.M.,  1991 .  Introduction  to  fuzzy  arithmetic:  theory  and  application.  Van 
Nostrand  Reinhold,  New  York,  361 

5.  Donato,  J.M.,  Barbieri,  E.,  1995.  Mathematical  representation  of  fuzzy  membership  functions.  IEEE 
Spectrum,  290  -  294. 

6.  Binder,  S.,  Forney,  D.,  Kaye,  W.,  Paschal,  D.,  1987.  Arsenic  Exposure  in  Children  Living  Near  a  Former 
Copper  Smelter.  Bulletin  Environmental  Contamination,  39:  1 14  -  121. 

7.  BCE  (British  Columbia  Ministry  of  Environment),  1990.  Human  Exposure  Reference  Values.  Internal 
report,  file  10-9-2. 

8.  Travis,  C.C.,  Richter,  S.A.,  Crouch,  E.A.C.,  Wilson,  R.,  Klema,  E.D.,  1987.  Cancer  Risk  Management. 
Environ.  Sci.  Tech.,  21(5),  415-420. 

9.  EPA,  1984.  Health  Assessment  Document  for  Inorganic  Arsenic.  Washington,  D.C. 


527 


Intelligence  in  Rolling  Processes 


528 


529 


Data  Mining  and  State  Monitoring  in  Hot  Rolling 

L.  Cser*,  A.S.Korhonen**,  J. Gulyas***,  P.Mantyla****, 

0.  Simula**,  Gy.  Reiss***,  P.Ruha***** 

*Bay  Zoltan  Institute  for  Logistics  and  Production  Systems, 

H3519  Miskolc-Tapolca,  Hungary,  recently 
**Helsinki  University  of  Technology,  P.O.Box  6200,  FIN-02015  HUT,  Finland 
***University  of  Miskolc,  H3515  Miskolc-Egyetemvaros,  Hungary 
****University  of  Oulu,  Finland 
*****RautarUukki  Steel,  Finland 

ABSTRACT 

An  overview  of  state  monitoring  in  hot  rolling  is  reviewed,  and  a  new  concept  of  state  monitoring  is  shown 
in  the  paper.  Based  on  a  detailed  analysis  of  all  factors  a  state  monitor  has  been  proposed.  A  system  state 
corresponds  to  the  proper  product  quality.  If  the  system  is  leaving  the  area  of  required  quality  in  the  state 
space,  signal  is  given  with  the  evaluation  of  situation.  Self-Organising  Maps  (SOM)  are  especially  suitable 
in  analysing  the  very  complex  process  of  hot  rolling.  Application  of  SOM  helps  to  discover  hidden 
dependencies  influencing  the  quality  parameters,  such  as  flatness,  profile,  thickness  and  width  deviation  as 
well  as  wedge  and  surface  quality.  Results  from  the  analysis  of  more  than  70  parameters  in  16.000  strips 
gave  the  state  space  used  in  state  monitoring  based  on  on-line  data  sampling.  The  coloured  visualisation 
map  shows  the  state  space  enabling  prediction  of  product  quality. 

INTRODUCTION 

Cutting  costs  and  increasing  added-value  of  steel  products  using  new  production  methods  and  advanced 
control  systems  are  key  factors  in  the  competitiveness  of  European  steel  producers.  Customers  require 
thinner  and  wider  plate  with  smaller  tolerances.  Constructions  are  made  lighter  and  assembled  on  automatic 
lines,  which  can  only  tolerate  minor  variations  in  the  spring-back  of  incoming  sheet  parts.  Short  delivery 
times  which  reduce  the  length  of  rolling  campaigns  also  increase  the  need  for  continuous  set-ups, 
adjustments,  control  and  monitoring  of  the  process.  To  meet  the  challenge  of  this  steadily-growing  pressure 
to  improve  product  quality,  rolling  mills  employ  extensive  automation  and  sophisticated  on-line  data 
sampling  techniques.  However,  the  sheet  quality  is  influenced  by  the  entire  "life  history"  of  the  rolled  strip. 

CONTROL  AND  MONITORING  IN  ROLLING  MILL 

Ten  years  ago,  thickness  deviations  for  3-5  mm  hot  rolled  strip  was  usually  -0.05  mm;  recently  this  has 
decreased  to  0.015  mm.  The  same  is  true  for  strip  width.  Extra- wide  strip,  trimmed  away  from  the  final 
product  was  typically  13-15  mm  in  the  80s.  Today  it  is  ~10mm,  but  the  target,  representing  the  world's  best 
practice  is  4mm.  This  progress,  experienced  only  recently,  results  from  veiy  intensive  development. 

Historical  background 

In  the  middle  of  this  century,  sheet  quality  was  a  product  of  the  teamwork  of  very  highly-skilled  labour,  so 
called  mill  supervisors,  or  "rollers",  led  by  mill  superintendents.  Two  mill  supervisors  stayed  at  the 
furnaces,  each  on  one  side,  controlling  the  volume  of  gas  and  air  supplied  to  the  furnaces,  opening  the 
doors,  and  controlling  bar  movement  on  the  transfer  rolls.  In  the  roughing  train  a  "roller"  controlled  each  of 
the  vertical  edgers  (transfer  bar  width,  speed  and  direction  of  the  traction  motors).  Two  mill  supervisors 
were  located  at  the  roughing  stand,  one  controlling  the  roll  gap  and  transfer  speed;  the  other  handled 
guidance  tasks  —  the  bar  transfer,  as  well  as,  the  hydraulic  descaler.  The  crop  shear  machine  was  also 
controlled  by  a  skilled  worker.  The  mill  superintendents  stayed  at  the  finishing  train.  One  of  them 
controlled  the  roll  gaps,  speed,  and  tension  in  the  finishing  stands  #1,  #2,  #3,  the  other  performed  the  same 
operations  for  stands  #4,  #5,  #6  as  well  as  the  cooling  line.  Dynamic  gap  control  on  the  drive  and  change 
sides  of  each  finishing  stand  was  done  by  the  stand  controllers.  At  the  end  of  the  finishing  train,  a  coil 
supervisor  controlled  each  of  the  coilers.  The  main  function  of  automatic  rolling  mill  control  took  over 
these  tasks,  dividing  them  into  four  main  groups:  optimal  roll  set-up,  process  monitoring,  material  flow 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


530 


control,  data  recording  and  report  generation.  The  tasks  are  organised  into  the  hierarchy  shown  in  Figure  1 

[1]. 

LEVEL  III  PLANT  COORDINATION  AND  SUPPORT  SYSTEM 

-  Przduztian  planning 
•  Production  scheduling 

-  Production  tracking 

-  Inventory  control 

-  Lola  collection 

-  Cola  analysis 


LEVEL  II  MILL  SETUP  SYSTEM 


-AHtCxVlaprfv*  prc*:-?«  model*  for:  -  High  level  locking 

•  Thlclmeoe  -  Pr<s<lir:ilon  loy-s 

•  Crov-n  -Eiirjlneeiliyj  logs 

•  Fhitii-eee  -  Pros  **?.  sH'K^lna 

•  Teiiiiwi-nluie  faciliti** 

•  Mntei  ini  |iiO|m lies  -  Larin  (erin  n  iiality 

assurwiw  iOptionel  I 
-  Olf  line  eiiriultilioii 
facilities  iOptiuiid  | 


LEVEL  I 


TECHNOLOGICAL  OCf-JTRDL  LOOP3 


-  Technological  closed  loops  tor: 
■  Thlf.kiwis, 

-  Crcvni 

-  Flatness 

-  Temperolure 

-  Advanced  sequencing 

-  Lnv.'  level  tracking 

-  P.it.-i  storage  rwslnrad 
npor.allnn 

-  H.iri-rii.v  hlne  eiiiitniiiiik.ivtlan 


-  Alarms  event  legs  and 
ringer  pi  Ini  eyderAS 

-  Measured  data 
acquisition 

-  Short  icim  quality 
assurance 

-  Medium  Irmi  quality 
assurance  ( Optional  | 

-  PrartU-.llnn 


LEVEL  0  BASIC  AUTOMATION 


-  Sul)*y*lenw  f<n  drive- wntioie 

-  8ul»*ys-leniw  ret  tMtivnilcwrs 

-  HenUvai  e 

-  MAjieiJtltis  qaqe* 


Fig.  1.  Hierarchical  control  of  rolling  mill  [1], 

Neural  networks  in  mill  control 

The  use  of  Artificial  Neural  Networks  (ANN)  in  rolling  started  in  the  beginning  of  the  1990s  and  the 
number  of  publications  is  growing  rapidly.  Reports  on  successful  applications  will  probably  extend  the 
interest  in  this  field  as  well  [2, 3, 4, 5],  Prediction  of  process  variables  using  measurements  from  previous 
process  phases  has  been  a  common  way  to  utilise  ANNs  in  industrial  applications.  In  the  automation  and 
control  of  strip  quality  and  in  the  determination  of  mill  settings  needed  for  incoming  strip  before  the  strip 
actually  enters  the  mill,  ANNs  have  been  used  based  on  both  measured  and  simulated  data.  An  efficient 
process  model  should  be  able  to  predict  rolling  forces,  torque,  material  properties  etc.  It  is  sufficient  that 
they  be  accurate,  reliable  and  adaptable  in  real  time.  By  using  ANNs,  these  goals  can  be  reached  provided, 
sufficient  process  data  is  available.  In  addition,  taking  post-calculations  into  account,  which  means 
compensating  the  pre-calculation  error  by  continuously  adapting  the  models  to  the  real-time  situation, 
neural  networks  can  do  both  modelling  and  adaptation  and  perform  them  equally  well  [6]. 


531 


For  control  tasks,  reasonable  task-sharing  between  mathematical  models  and  neural  network-based 
corrections  is  necessary  [7] : 

•  The  on-line  computations  should  be  performed  by  neural  networks,  but 

•  Physically-correct  mathematical  modelling  should  be  utilised  where  ever  possible. 

Validity  of  a  full  neural  network  model  is  accomplished  only  with  the  greatest  of  difficulties,  because  the 
model  remains  for  the  engineer  as  a  "black  box".  The  tasks  solved  by  ANN  are  as  follows: 

•  Prediction  of  the  width  in  the  finishing  mill, 

•  Correction  of  the  strip  or  plate  temperature  calculated  by  analytical  models. 

Networks  used  in  rolling  mill  control  are  trained  off-line  in  development  laboratories  after  which  the  model 
can  be  installed  to  control  or  optimise  the  process  and/or  used  for  off-line  analysis.  In  some  applications, 
the  ANN  model  is  trained  on-line,  which  is  a  highly-demanding  task,  at  least  in  most  cases.  Currently, 
ANNs  are  rather  widely  and  successfully  used  in  rolling  mills,  as  it  can  be  seen  in  Fig.  2. 


Fig.  2.  Rolling  mills  with  neural  network  control  [7] 

DATA  AND  DATA  MINING  IN  HOT  ROLLING 

In  order  to  set  up  the  models  for  fine-tuning  of  rolling  mill  control,  analysis  of  all  influencing  factors  is 
necessary.  One  way  to  analyse  the  factors  and  their  influences  is  by  laboratory  experiments.  These  are 
rather  expensive,  and  carry  in  them  all  the  dangers  of  physical-modelling  (scale-up  factors,  differences  in 
boundary  conditions,  etc.).  Another  approach  is  to  analyse  industrial  data,  measured  on-line  during  the 
actual  rolling  process,  using  methods  such  as  Data  Mining,  especially  the  visualising  possibilities  of  Self 
Organising  Maps. 

The  collected  data  required  by  ISO  900X,  is  a  rich  source  of  information  about  the  extremely  complex 
process.  The  rolling  process,  consisting  of  furnaces,  transport  rolls,  rolling  stands  of  the  roughing  and 
finishing  trains,  water-cooling  system,  lubrication  etc.  is  in  itself  very  complex.  In  addition  to  the  current 
state  of  the  mill,  the  process  conditions  and  product  properties  are  affected  by  other  factors,  such  as  the 
chemistry  of  the  material  to  be  rolled,  the  history  of  each  slab  before  and  during  rolling,  casting,  and  re¬ 
heating,  as  well.  Factors  influencing  quality  have  very  complex  confluence.  When  one  of  them  changes 
(e.g.,  temperature),  the  rolling  force  is  affected,  but  so  is  the  torque,  surface  properties,  and  kinematic 
boundary  conditions.  The  same  is  valid  for  dimensional  quality  parameters  —  thickness,  profile,  flatness, 
width,  and  wedge,  as  well  as,  the  surface  roughness. 

As  can  be  seen  in  Figure  1,  in  a  modem  rolling  mill  each  production  unit  such  as  furnaces,  rolling  stands 
and  cooling  lines  have  their  own  computer  control  with  sensors.  The  rolling  mill  automation  system  forms  a 
hierarchy  of  computers  and  programmable  controllers.  These  range  in  function  from  simple  closed-loop 
regulators  (Level  0)  for  tasks  such  as  actuator  position  and  motor  speed  control,  through  to  intermediate 
controllers  (Level  1)  for  tasks  such  as  sequence  management  and  complex  closed  loop  regulation  and  finally 
high  level  computers  (Level  2,  often  called  the  optimisation  system,  and  Level  3)  for  tasks  such  as  material 
tracking,  production  unit  set-point  calculation,  quality  monitoring  and  production  control.  Process  and 
quality  critical  data  are  collected  from  sensors  by  all  levels  of  the  automation  system  and  these 
measurements  flow  up  this  hierarchy  from  Level  0  to  Level  3. Great  numbers  of  controlling  and  data- 
sampling  computers  work  together,  storing  the  data  for  short  or  long  term  analysis.  For  in-bar  analysis  and 
modelling,  huge  amounts  of  high  frequency  data  must  be  collected  and  stored.  In  uncompressed  format,  this 


532 


would  require  a  huge  amount  of  space,  therefore  advanced  compression  and  feature  extraction  methods  are 
needed.  In  feature  extraction,  ANNs  together  with  other  data  mining  methods  are  very  promising  tools.  In  a 
typical  modem  hot  rolling  mill,  a  full  temperature  map  is  created  in  and  after  the  furnace,  before  the 
roughing  stand,  before  the  first  finishing  stand,  after  the  first  finishing  stand,  after  the  last  finishing  stand, 
and  before  the  coiler.  After  the  last  stand  of  the  finishing  train,  final  dimensional  parameters  are  measured. 
In  each  stand  the  roll  separating  force,  roll  bending  force,  tension,  strip  bending,  and  roll  shifting  values  are 
measured  and  stored  (see  Fig.  3).  Grain  size  and  mechanical  properties  will  be  added  later. 

The  measurements  listed  above  run  continuously  during  rolling  of  each  strip  or  plate.  The  data-sampling 
interval  is  between  0.2-2  seconds.  Temperature  is  measured  across  the  whole  width,  before  the  finishing 
mill  on  both  upper  and  lower  sides  of  the  strip.  This  means  that  millions  of  measurements  are  made  for  each 
strip.  In  the  specific  case  of  the  hot  strip  mill  considered  in  this  study,  hundreds  of  strips  are  rolled  daily. 

For  the  control  system,  as  well  as  for  product  records,  only  some  averages,  and  standard  deviations  are 
needed.  Other  sampled  data  must  be  deleted  dynamically.  Some  are  deleted  immediately  after  averaging, 
some  are  stored  for  few  weeks  or  months.  Data  are  stored  in  different  computers  on  different  hierarchical 
levels  of  the  control  system.  They  can  be  collected  by  simply  using  either  the  strip  ID  number  or  according 
to  a  timestamp.  Since  the  measurements  have  been  made  in  different  places  of  the  strip  at  different  times,  an 
additional  transformation  is  necessary  in  order  to  synchronise  the  values  along  the  length  of  the  strip. 


Statistically  preprocessed  data 

2 

8 

48 

Sampled  data  (1/sec) 

1  4 

4 

~40Mbyte/strip 

4  6 

Scanned  data 

1 

Tail  Body  Head  (1/sec)  max.  deviation  min.  deviation 
/  \  avg.  deviation 

min  max  st.  deviation 

1/sec 


Fig.  3.  On-line  measurements  (data  sources)  in  hot  rolling 
(Q  -  casting  machines,  F;  -  furnaces,  Tbar>  Wbar  -  bar  temperature  and  bar  width,  T  -  temperatures 
of  the  strip,  head,  body  and  tail,  F;  (F,  Fb,  F,  )  -  force  vector  in  stand  i  (roll  separating  force,  roll 
bending  force,  strip  tension  force),  T„  -  temperature,  measured  before  the  stand,  H,  W,  We,  FI,  Pr 
-  geometric  quality  parameters:  thickness,  width,  wedge,  flatness,  profile,  Tc  -  coiling 
temperature) 


Self-Organising  Maps  in  Hot  Rolling 

The  Self-Organising  Map  (SOM)  [8]  is  a  neural  network  algorithm  based  on  unsupervised  learning.  Unlike 
supervised  learning  methods,  the  SOM  can  be  used  for  clustering  data  without  knowing  the  class 
memberships  of  the  input  data.  It  can,  thus,  be  used  to  detect  features  inherent  to  the  problem.  The 
projection  on  the  component  planes  can  be  interpreted,  as  slicing  the  ^-dimensional  model  vectors  of  the 
map  into  component  planes.  Each  component  plane  consists  of  values  of  a  single  vector  component  in  all 
map  units.  In  case  of  hot  rolling,  the  measured  data  characterising  the  rolling  of  the  strip  represent  a  data 


533 


vector.  Components  of  these  vectors  change  from  strip  to  strip.  The  SOM  component  planes  give  the 
clustering  of  each  component  separately. 

Component  planes  can  also  be  used  for  correlation  hunting:  discovering  hidden  co-incidences.  Correlation 
between  components  can  be  seen  as  similar  patterns  in  identical  positions  of  component  planes.  Pattern 
matching  is  something  that  the  human  eye  is  very  good  at,  and  it  is  further  enhanced  by  regular  shape  of  the 
map  grid.  Using  component  planes  in  correlation  hunting  is  easy,  selecting  the  "suspicious"  component 
combinations  for  further  investigations.  An  advantage  of  SOM  is  that  it  is  not  necessary  to  know  the 
character  of  correlation.  It  can  also  be  non-linear.  The  similar  patterns  do  not  mean  a  causal  connection! 
They  can  be  caused  by  a  third  factor,  as  well.  The  final  answer  can  be  done  only  by  human  expert. 

Results  of  Analysis 

A  study  of  -16,000  strips  covering  70  measured  values  for  each  strip  could  deliver  information  on  the 
process  and  the  mill.  Some  interesting  and  useful  co-incidences  between  the  technical  parameters,  enabling 
quality  improvement  have  been  published  [9,10],  Since  the  dimensional  quality  parameters  have  a  close 
connection,  the  question  arose  as  to  the  elevated  requirements  for  strip  dimensions  are  not  contradictory.  Is 
it  possible  to  satisfy  all  elevated  requirements  by  the  given  rolling  mill? 

One  of  the  advantages  of  the  SOM  is  that  it  can  deal  with  discrete  variables.  Figure  4.b  shows  an  attempt  to 
bring  all  geometric  quality  parameters  into  correspondence  with  the  temperature  of  the  last  finishing 
operation.  Maps  contain  the  information  in  binary  terms  (whether  they  satisfy  the  stricter  requirements  or 
not).  The  clear  areas  of  the  component  plans  are  those  areas  corresponding  to  the  goal  requirements  (width 
deviation  8  mm  <AW<12  mm,  thickness  deviation  -0.01  mm  <  AH  <  0.01  mm,  flatness  deviation  -10  <  I  < 
0,  profile  deviation  20  pm  <  A  <  100pm).  The  area  corresponding  to  all  elevated  quality  parameters  is 
shown  simultaneously  with  the  white  colour  on  the  plan  of  intersections.  The  size  of  the  area  shows  that  the 
investigated  rolling  mill  can  produce  the  required  quality  rather  easily  from  a  technical  point  of  view  (the 
other  grey  scales  correspond  to  areas,  where  only  some  requirements  are  satisfied).  Comparison  of  the 
shape  of  the  white  area  with  the  pattern  of  the  finishing  temperature  shows  very  few  similarities.  However, 
it  can  be  seen  that  achieving  the  highest  dimensional  accuracy  is  not  only  a  problem  of  temperature  control. 

ON-LINE  STATE  MONITORING 

Using  the  SOM,  after  examined  the  properties  of  the  prototype  vectors,  new  data  can  be  analysed.  The  term 
"data"  can  refer  to  whole  data  sets  or  to  single  data  samples.  The  main  question  to  be  answered  is:  Which 
part  of  the  mapped  distribution  corresponds  best  to  the  given  data?  In  other  words:  where  the  data  samples 
are  located  on  the  map. 

The  data  sets  which  are  the  easiest  cases  of  quality  analysis  are  the  geometric  quality  parameters  and  the 
intersection  of  elevated  requirements.  Figure  4.a  shows  a  typical  diagram  of  on-line  geometric  quality 
control  measurement.  A  data  set  collected  from  the  measured  geometric  quality  parameters  in  a  strip 
location  correspond  to  a  proper  system  state,  described  by  a  point  in  the  state-space.  The  point  can  be  found 
in  each  component  map  and  in  the  intersection  map.  By  using  dynamic  seeking  for  the  location  of  system 
states  in  component  planes,  changes  in  system-state  can  be  followed.  Time  series  of  points  corresponding  to 
each  sampled  data  set  shows  how  the  system  "moves"  in  state  space.  Data  for  each  rolled  strip  require 
transformation  into  non-dimensional  length  co-ordinates.  This  is  done  because  mill  sensors  are  located  in 
different  position  along  the  mill  line,  and  so  measurements  are  recorded  at  different  times  in  the  processing 
history  of  the  strip  and  as  well,  the  strip  is  continually  being  elongated  due  to  thickness  reduction  and, 
hence,  the  length  of  the  strip  in  not  constant  during  rolling.  Some  sampled  data  points  corresponding  to  the 
dynamic  state-changes  of  the  rolling  mill,  as  a  time  series  in  Figure  4.a  can  be  seen  in  Figure  4.b,  indicated 
by  the  arrows.  Figure  5  shows  the  position  of  state  monitoring  in  the  set-up  and  control  model  of  a  rolling 
mill. 

CONCLUSION 

•  Self-Organising  Maps  are  an  effective  method  to  discover  hidden  dependencies  among  the  technique 
parameters  in  a  fully  automated  environment.  SOM  enables  separating  relevant  and  irrelevant  factors. 

•  Maps  which  assess  dimensional  quality  parameters,  clearly  show  system-state-changes  during  strip¬ 
rolling. 

•  A  new  method  for  state-monitoring  and  prediction  of  strip  quality  has  been  introduced,  based  on  Self- 
Organising  Maps. 


534 


Flatness,  thickness  and  width  deviations 


—  Thickness  deviation  —  flatness  LI 

—  flatness  L2 _ —width  deviation 

Fig.  4.  a.  Measured  change  of  geometric  quality  parameters. 


profiis-dev-avg-body 


Iw&h&fi&pr 
h&fl&pr 
w&fl&pr 
i&pr 
w&h&pr 
h&pr 


w&h&fl 

h&fl 

fl 


Fig.  4.  b.  Movement  of  the  system  in  state  space. 


535 


Feed  Forward  Info 

PDI 

Primary  data  from 
host  computer  or 

rS 

operators  pulpit 

L<J 

Entry  strip  data 


Fig.  5.  Mill  set-up  model  with  the  indication  of  the  expected  state  [1] 


536 


ACKNOWLEDGEMENT 

The  authors  would  like  to  thank  the  contributors  of  all  partners,  especially  Dr.  J.  Larkiola  (VTT),  Dr. 
P.Myllykoski,  Mr.  J.Ahola  (Helsinki  University  of  Technology),  Mr.L.Arvai  and  Mr.  David  C.Martin 
(University  of  Oulu). 

REFERENCES 

1  Schultze,  D.:  Development  and  Application  of  Process  Models,  Technologies  for  the  Enhancement  of 
Rolling  Mills  and  Processing  Lines,  MANNESMANN  DEMAG 

2  Korhonen,  A.S.,  Larkiola, J.,  1997.  Proceedings  ofIPMM'97,  Gold  Coast,  Australia,  July,  1997. 

3  Montesi,  M.,  Trivella,  F.  and  Brambilla  A.,  1996.  Proc.  of  EANN'96,  99-102,  17-19  June,  London,  UK. 

4  Myllykoski,  P.  and  Larkiola,  J.,  1996.  Report  TKK-MAK-MML  1/96. 

5  Cser,L.  Korhonen, A.S.,  Simula, O.,  Larkiola, J.,  Myllykoski, P.,  Ahola,J.,  1998.  Knowledge  Based 
Methods  in  Modelling  of  Hot  Rolling  of  Steel  Strip,  Proceeding  of  the  ICME  98, Capri,  Italy,  265-270 

6  Portman,  F.,  Lindhoff,  D.,  Sorgel,  G.,  Gramckow,  O.  1995.,  Iron  and  Steel  Engineer.  February,  33-36. 

7  Gramckow, O.,  Jansen,M.,  Feldkeller,  B.,  1998.  AnwendungNeuronalerNetze  fur  die  Prozefisteuerung, 
Tagungsband  MEFORM 98,  25-27  February,  1-23. 

8  T.Kohonen.  Self-Organizing  Maps,  1995.  Springer,  Berlin,  Heidelberg, 

9  Myllykoski, P.T.,  Larkiola, J.E.,  Korhonen, A.S.,  Cser,L.,  1999.  Predicting  and  Modelling  Flatness  with 
Neural  Networks,  Proceedings  of  the  2nd  ESAFORM  Conference,  Guimaraes,  Portugal. 

10  Cser,L.,  Korhonen, a.s.,  Mantyla,P.,  Simula, O.,  1999.  Data  Mining  in  Improving  the  Geometric  Quality 
Parameters  of  Hot  Rolled  Strips,  Proc.  Inter.  Conf.  on  Quality  Manuf,  Stellenbosch,  South  Africa,  8-16. 


537 


DETERMINATION  OF  THE  THICKNESS  CONTROL  PARAMETERS 
OF  THE  ROLLING  PROCESS  THROUGH  THE  SENSITIVITY 
METHOD,  USING  NEURAL  NETWORKS 

L.E.  Zarate*,  H.  Helman** 

*Department  of  Computer  Science, 

Pontifical  Catholic  University  of  Minas  Gerais,  Brazil 
Email:  zarate@brhs.com.br 

**Department  of  Metallurgical  and  Materials  Engineering, 

Federal  University  of  Minas  Gerais 
Email:  hhelman@demet.ufmg.br 


ABSTRACT 

The  single  stand  rolling  mill  governing  equation  is  a  non-linear  function  on  several  parameters  (entry 
thickness,  front  and  back  tensions,  average  yield  stress  and  friction  coefficient  among  others).  Any 
alteration  on  one  of  them  will  cause  alterations  on  the  rolling  load  and,  consequently,  on  the  outgoing 
thickness.  This  paper  presents  a  method  for  the  calculation  of  the  appropriate  adjustment  on  the  three 
control  parameters  (roll  gap,  front  or  back  tensions),  in  which  the  sensitivity  equation  of  the  process, 
obtained  by  differentiating  a  neural  network,  is  used. 


INTRODUCTION 

The  single  stand  rolling  mill  governing  equation  is  a  non-linear  function  on  several  parameters  (1).  Any 
alteration  on  either  of  them:  the  entry  thickness  (hi),  the  front  (t/)  or  back  (th)  tensions,  the  average  yield 

stress  (y )  or  the  friction  coefficient  (p  ),  will  cause  alterations  on  the  rolling  load  (P)  and,  consequently 
on  the  outgoing  thickness  ( ho ). 

h=f(P  ,hi,tb,t  f,[i,y,E,R  ,W  ,  M)  1. 

where: 

E  =  Young  modulus  of  the  strip  material 
R  =  roll  radius 
W=  strip  width 

M=  rigidity  rolling  mill  modulus 

When  such  alterations  in  the  rolling  process  occur,  three  control  parameters  are  mainly  used  to  restore  the 
outgoing  thickness  and  so  ensuring  the  A  h0  =  0  condition:  the  roll  gap  and  the  front  and/or  back  tensions. 
In  the  present  work,  a  method  for  the  calculation  of  the  appropriate  adjustment  of  the  three  control 
parameters  is  presented.  In  this  method  the  sensitivity  equation  of  the  process,  obtained  by  differentiating  a 
neural  network  is  used. 

The  work  is  organised  in  five  sections.  In  the  first  section,  the  representation  of  the  rolling  process 
operation  by  means  of  artificial  neural  networks  (ANN)  is  presented.  In  the  second  section  the  bases  for  the 
calculation  of  the  sensitivity  equations  through  the  differentiation  of  the  neural  network  are  described.  In 
the  third  section,  the  method  for  the  calculation  of  the  parameter  adjustment  when  alterations  on  the  rolling 
process  occurred  is  presented  and  finally,  an  application  to  a  strip  rolling  process  case,  results  and 
discussion  are  presented. 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


538 


REPRESENTATION  OF  THE  ROLLING  MILL  OPERATION  THROUGH  ANN 

In  this  case,  a  back-propagation  neural  network,  with  six  entries  (N=6),  two  exits  (M=2)  and  one  hidden 
layer  with  13  neurons  (2N+1),  is  used.  A  sigmoid  function  was  selected  as  the  activating  function. 


(hng,\y,tb,tf,y) 


Neural  Network 


<K,P) 


2. 


Generally,  the  largest  effort  to  get  a  neural  network  trained  lies  on  collecting  and  pre-processing  neural 
network  input  data.  The  pre-processing  operation  consists  in  the  data  normalization  in  such  away  that  the 
inputs  and  outputs  values  be  within  the  0  to  1  range. 

The  following  procedure  was  adopted  to  normalize  the  input  data  before  using  it  in  the  ANN  structure: 

•  In  order  to  improve  convergence  of  the  ANN  training  process,  the  normalization  interval  [0,  1]  was 
reduced  to  [0.2,  0.8], 

•  The  data  was  normalized  through  the  following  formula: 

Ln  =  (Lo  -  Lmin)  /  (Lmax  -  Lmin)  3. 

where  Ln  is  the  normalized  value,  Lo  value  to  normalize,  Lmin  and  Lmax  are  minimum  and  maximum 
variable  values,  respectively. 

•  Lmin  and  Lmax  were  computed  as  follows: 

Lmin  =  (4  x  Limitelnf.  -  LimiteSup)  /  3  4.a. 

Lmax  =  (Limitelnf.  -  0.8  x  Lmin)  /  0.2  4.b. 


SENSITIVITY  EQUATIONS 

In  this  section,  the  calculation  of  the  sensitivity  factors  through  the  differentiation  of  a  general  neural  net, 
with  N  entries,  M  exits  and  L  neurons  in  the  hidden  layer,  is  presented. 

The  currently  used  symbols  : 

Ut,  i  =  0,...,  N  are  the  net  entries  and  U0  =  1  is  a  polarization  entry 
f°(.)  i  =  0 are  the  normalization  entry  functions  and  /0“(.)  =  1 
Xn  i  =  0 are  the  normalized  entries  XQ  =  U0 

W;  i  =  l,...,L  e  j  =  0,...,N  is  the  weight  corresponding  to  the  neuron  i  and  entry  j 

N 

net ,  =  /  fV,yX/  j  =  l,...,L  product  of  weights  times  entries 

;=0 

/■(netj)  j  =  0,...,L  with  f^inet^)  =  1  is  the  sigmoid  function  of  the  hidden  layer. 

I  j,  j  =  Q,...,L  are  the  corresponding  values  of  the  sigmoid  function  I0  =  1 

W~  i  =  1  e  j  =  0  is  the  weight  of  the  neuron  i  and  entry  j  for  the  hidden  layer. 

L 

net'-  =  y .  W°It  j  =  1  product  of  the  weights  times  entries  for  the  hidden  layer. 

;=o 

f°(net J)  j  =  is  the  value  of  the  sigmoid  function  for  the  exit  layer 

Yj,  j  =  1  are  the  normalized  exits  of  the  net,  obtained  from  the  sigmoid  function 

f-  {  )  i  =  1  are  the  denormalization  functions  of  the  exits 


539 


Z(.,  i  =  net  exits  values 

e  max  k ,  e  min  k  k  =  1,..,N  higher  and  lower  value  of  the  entries 

•smax^smillj.  k  =  1  ,..,M  higher  and  lower  value  of  the  exits 
The  procedure  for  obtaining  the  expressions  of  the  sensitivity  of  the  net: 

Z,  =fbW) 

Z2=f2b{Y2) 


Zm=/m(Ym) 


will  now  be  described. 


With  an  appropriate  manipulation  of  the  variables,  Equation  6.,  that  correlates  the  entries  with  the 
normalised  exits  of  the  net  is  obtained: 

j=0  1=0 

z.  -  f!w;(Z  w-,  /;(£  6 

7=0  1=0 


z„  =  /«(/„”<!  Kf: '«/)») 

7=0  1=0 

By  substituting  the  corresponding  values  for  the  functions  •),  Equation  7.  is 

obtained: 

Z,  = - ! — —  [smax  ,  -  smin  ,  ]+  smin  , 

1  +  exp  ' 

1  r  i  7- 

Z2  = - —  [smax  2  -  smin  2  J+  smin  2 

1  +  exp  3 


where  : 


In  a  general  form.  Equation  7.  becomes: 

*  1  +  exp‘r<  ^ 

para  k  =  1,..,  M 


ZM  =  ; - l- — —  [smax  M  -  smin  M  ]  +  smin  M 

1  +  exp  “ 

vk  =  £  w°ff  (I  Wlf? ([/,.))  para  k  =  1,...,  M 

y=0  f=0 

omes: 

Zk  = - -  -_■■  ■  [smax  k  -  smin  k  ]  +  smin  k 


The  sensitivity  factors  will  be  calculated  from  equation  (10): 


az, 

az, 

az,  ' 

at/, 

at/2  - 

-  W  N 

dZM 

dZM 

dZM 

at/, 

dU2 

where  each  term  of  the  sensitivity  matrix  is  calculated  in  the  form: 


540 


dZk 

at/,. 


.  exp  Vk  dVk 

=  [smax  *  -  smin  J- - 

(1  +  exp  * )  9t/,. 


11. 


Manipulating  the  derivative  term  of  Equation  11.  and  taking  into  account  Equation  8.,  the  following 
expression  is  obtained: 


dVL  = 

dU, 


12. 


By  differentiating  Equation  12.  and  substituting  the  expression  in  Equation  11.,  Equation  13.  is  obtained, 
which  allows  the  calculation  of  the  sensitivity  factors  starting  from  the  net: 


BU  . 


( smax  j  -  smin  j  ) 


(smax  M  -  smin  M  ) 


M 


-  V.  ? 

(1  +  exp 

1  )2 

v  exp 

-r2 

(I  +  exp 

-2)2 

s  exp 

M 

(1  +  exp  M  )2 


W  u  x 


exp 


-  (  1  W”X.) 
i  =  0  '  1 


-  (  I  wx h.X  .) 


(1  +  exp  1  ~  ® 
N 


(  I 

/  =  0  1 


(1  +  exp  *  ~  0 
M 


(  I  ^2iXi) 


(  I  W  /  \X  ) 
i  =  0  u  1 


-<*nWLiXi>  2 
/  =  0 


(1  +  exp 
W . 


emax 

j  -  emin  j 

emax 

2  -  emin 

2 

W  h 
n  21 

w  h 
ir  n 

emax 

j  -  emin  j 

emax 

2  -  emin 

2 

M 

M 

w  h 

n  L  1 

W  h 
n  L  2 

emax 

j  -  emin  j 

emax 

2  -  emin 

2 

emax 

N 

w  2 

-  emin 

h 

:  N 

N 

emax 

N 

-  emin 

N 

M 

w  h 

n  LN 

emax 

N 

-  emin 

N  . 

The  index  "o"  corresponds  to  the  product  of  the  first  value  of  the  column  vector  by  the  whole  first  matrix 
row,  extended  to  all  the  rows  of  the  matrix  and: 


W\\ 

w° 

"12 

A 

K 

K 

w° 

rr22 

A 

w2°L 

M 

M 

M 

M 

W° 

"Ml 

w° 

"M2 

A 

W° 

n  ML 

DETERMINATION  OF  ADJUSTMENT  IN  THE  CONTROL  PARAMETERS 

The  necessary  steps  to  determine  the  adjustments  corresponding  to  the  control  operation  in  the  rolling 
process  are  next  described: 

I.  The  entries  and  nominal  exits  are  contained  in  the  vectors  X  —  (h*e,g* ,  \l* ,t*r  and 
—  *  *  + 

Y  =  (hs ,  P  )  respectively. 

II.  Through  Equation  13.  it  is  possible  to  calculate  the  sensitivities  for  the  selected  nominal  point. 

III.  After  calculating  the  sensitivity  factors  it  is  possible  to  obtain  the  linear  equations  of  the  process, 
Equations  14.  around  the  nominal  operation  point: 


541 


dh,  dg  3(1  dtb 


being  AX=X  *-X 


If  a  variation  takes  place  either  in  one  or  all  the  operational  parameters  he,g,\l,tr,t  f,y ,  an  alteration 
will  occur  in  the  hj  value.  The  existence  of  a  factor  K  such  that  A hj  =  0  may  be  admitted.  That  is  to  say 
(15): 

dhr  dhf  dhf  dhf  dhf  —  dhr 
0  =  A h  — -  +  A g — -  +  A|l— ^ -  +  Atr  —-^-  +  Atf~-  +  Ay—J^  +  K  15. 
e  dhe  *  dg  *  r  dtr  f  dtf  dy 


Equating  Equations  14.  and  15.,  we  obtain: 


K  =  -Ah, 


The  value  of  K  depends  on  the  selected  control  parameter:  roll  gap,  front  or  back  tensions  and  may  be 
defined  as: 


K  =  Ag- 


or  K  =  At 


dhf 

or  K  -  Atf — - 

}  dtf 


If  control  action  by  means  of  the  roll  gap  is  assumed,  the  equations  would  be: 


dhf  ,  A hf 

K  =  Ag—=  -Ahf  Ag  =  7 


g  =  g  +■ 


Similarly,  in  terms  of  the  front  and  back  tensions: 


’’  'r  *  rlh, 


19.  and  20. 


APPLICATION,  RESULTS  AND  CONCLUSION 

As  an  example  of  the  possibilities  of  the  method,  a  numerical  application  to  a  rolling  process  case  will  be 
presented.  The  operation  point  was  chosen  as:  h,— 5.0  mm;  ha,=  3.6  mm;  1.846  mm.  p=0.12;  tf  =9.098 

kgf7mm 2 ;  tb  =0.441  kgf/mm2;  y  =46.918  kgf/mm2;  W  =  500  mm;  £=20,400  kgf/mm 2 ;  £=  292.1 
mm;  M=500,000  kgf/mm 2  and  P=  875.3  ltf. 


To  obtain  the  data  sets  for  ANN  training,  the  parameters  variations  were  chosen  as:  /i,=±  8%;  hot=  ±  3%;  p 

=+  20%;  tf  =±  30%;  tb  =±  30%  and  y  =±  10%.  Three  different  values  were  chosen  for  each  parameter 
resulting  in  729  training  sets.  The  load  rolling  was  obtained  through  Alexander's  model  [1]  and  the  roll  gap 
by  the  elastic  equations  for  the  rolling  mill  (Equation  21). 


The  final  weights  for  the  hidden  and  output  layers  with  its  polarization  weight  are: 


542 


6.2010 

9.4161  -1.7278  -0.2642  1.8475 

1.3078 

'  -11.4784' 

[  0.5503 

0.4083  ' 

-1.5239 

-9.2425  -2.4238  0.0734  0.7813 

-6.0229 

8.9265 

-1.7551  -2.1593 

-11.4365 

-1.7129  1.2562  -0.2171  -0.6941 

3.3125 

2.5654 

-0.9245  -0.1960 

w° 

6.4599 

-0.2767 

-8.3214  -5.5920  -0.2737  -3.3595 

-0.8293 

10.9804 

-0.2976 

0.2163 

"  bias 

-1.4106 

3.3504 

5.2707  3.4006  0.0343  -0.5938 

4.7363 

-10.4911 

4.2976 

0.9087 

8.9832 

3.9726  0.7990  9.7730  0.9096 

5.5016 

-15.1094 

0.0021 

0.0045 

w"  = 

-6.1621 

3.4810  7.4542  0.5558  -7.0447 

-0.9006 

f'L  = 

5.3887 

W°  = 

0.1479 

0.1406 

-4.8199 

-3.2388  6.5208  -3.4559  -0.6086 

8.9722 

-  4.5520 

0.1378 

0.0480 

0.8914 

-8.2071  -7.6049  -0.1518  -2.2323 

-1.5593 

7.1306 

-1.1542  -0.0074 

-1.2378 

-10.6099  -0.0338  0.0399  0.4850 

-2.1458 

8.8376 

-6.4508 

1.0470 

8.3913 

2.3521  5.9967  -8.1868  -4.0553 

1.8826 

-  0.4583 

-0.0276  -0.0106 

-1.4947 

-11.9576  -1.6808  0.0733  0.7387 

-  4.2602 

8.7774 

-3.9372  2.4463 

2.9411 

4.8319  -11.5151  -1.5484  -4.0985 

-4.2330 

2.5081 

0.1814  0.0746 

The  events  sequence  to  determine  the  control  operation  adjustments  are  described  next: 

I.  Define  the  nominal  inputs:  [h',g\ \L  ,t'b,t*f  ,j/]=[5.00;  1.846;  0.12;  0.441;  9.098;  46.918]; 

II.  Provide  the  nominal  outputs:  [h’0,P’]=[ 3-6;  875.31]; 

III.  Calculate  through  Equation  13  the  sensitivity  coefficients  (Equation  14)  for  the  selected  nominal 

point:  [^l  ^k]  =  [0.3566;  0.6436;  4.7345; -0.0163; -0.01 1;  0.03271; 

IV.  In  the  presence  of  parameter  variation,  compute  current  inputs  as:  [fy.g,  f,y]=  [4.9;  1.846; 

0.118;  0.432;  9.098;  46.918]; 

V.  Using  an  ANN  previously  trained,  determine  the  current  outputs:  [ho ,  P]  =  [3.552;  855.527]; 

VI.  Determine  the  control  parameters  using  Equations  18,  19  and  20: 

[g  ,tb  ,  t j-  ]=[1.920; -2.504;  4.734],  This  corresponds  to  corrections:  +3.8%,  -639.9%  and  -54% 
respectively; 

The  smallest  correction  percentage  indicates  the  best  control  action.  Therefore,  the  correction  of  the  output 
thickness  deviation  should  be  made  through  the  roll  gap.  Notice  that  the  value  calculated  for  t~  is  negative 
and  should  be  saturated  tb  =  0  • 


In  order  to  verify  the  calculated  adjustments,  the  three  possible  corrections  were  simulated  through  an 
interactive  procedure  using  Alexander's  model  and  equation  (21).  For  the  roll  gap  action  (g  =1.920)  the 

value  of  the  output  thickness  was  3.585  mm  with  an  error  of  0.42%.  For  the  back  tension  action  (tb  =0)  the 
value  of  the  output  thickness  was  3.555  mm  with  an  error  of  1.25%  and  for  the  front  tension  action 
( t  f  =4.734)  the  value  of  the  output  thickness  was  3.584  mm  with  an  error  of  0.44%. 

ACKNOWLEDGEMENT 

This  work  was  accomplished  with  support  of  the  Pontifical  Catholic  University  of  Minas  Gerais,  PUC-MG, 
SaicSystems  and  Mechanical  Conformation  Research  Group  (UFMG-Brazil). 


REFERENCES 

1.  Alexander,  J.  M.,  1972.  On  the  theory  of  rolling.  Proc.  R.  Soc.  Lond.  A.  326,  pp.  535-563. 

2.  Zarate,  L.E.,  1998.  Doctoral  Thesis,  Federal  University  of  Minas  Gerais,  Belo  Horizonte,  MG,  Brazil. 


543 


Artificial  Intelligence  Approach  to  The  Modeling  of  Rolling  Loads 
in  Technology  Design  for  Cold  Rolling  Processes 

J.  Kusiak*,  J.G.  Lenard**,  K.  Dudek* 

*Akademia  Gomiczo-Hutnicza,  Mickiewicza  30,  30-059  Krakow,  Poland 
**  University  of  Waterloo,  Waterloo,  Ont.  N2L  3G1,  Canada 


ABSTRACT 

The  paper  presents  an  attempt  to  apply  artificial  neural  networks  to  the  prediction  of  the  influence  of  various 
frictional  conditions  on  rolling  forces  and  torques.  Training  of  the  network  was  done  using  experimental  data, 
which  consist  of  the  results  of  load  measurements  during  cold  rolling  of  aluminum  alloys  in  different 
lubrication  conditions.  The  properties  of  the  lubricant  became  the  input  variables  for  the  neural  network. 
Accurate  prediction  of  the  rolling  forces  and  torques  during  cold  rolling  under  varying  frictional  conditions  is 
the  main  ability  of  the  model.  The  artificial  neural  network  was  validated  using  data,  which  were  not  used 
during  the  training  procedure.  Next,  the  predictions  of  the  artificial  neural  network  were  compared  with  the 
finite  element  calculations  of  rolling  under  varying  friction  conditions.  This  validation  confirmed  the  good 
predictive  ability  of  the  ANN  model. 


INTRODUCTION 

The  quality  and  dimensional  accuracy  of  rolled  products  can  be  controlled,  provided  each  parameter  of  the 
rolling  process  is  well  known.  The  development  of  computer  techniques  create  the  possibility  of  calculations 
of  these  parameters  using  numerical  methods,  based  mainly  on  finite-element  models.  These  methods  require 
large  computational  time  and,  therefore  cannot  be  implemented  in  an  on-line  control  system.  An  additional 
difficulty  concerns  the  proper  definition  of  the  boundary  conditions  necessary  for  numerical  calculations. 
Therefore,  alternative  techniques,  based  on  artificial  intelligence,  appear  very  useful  in  numerical  modeling  of 
complex  problems  in  rolling. 

The  appropriate  modeling  of  frictional  conditions  appears  to  be  the  main  difficulty  in  the  analysis  of  the  cold 
rolling  processes.  Therefore,  the  general  goal  of  the  paper  is  an  attempt  of  the  application  of  artificial  neural 
networks  to  the  prediction  of  the  influence  of  various  frictional  conditions  on  the  rolling  forces  and  torques. 

Training  of  the  network  was  done  using  the  experimental  data,  which  consist  of  the  results  of  load 
measurements  during  cold  rolling  of  aluminum  alloys  using  different  lubrication  conditions.  The  properties  of 
the  lubricant  became  the  input  variables  for  the  neural  network.  The  ability  of  the  accurate  prediction  of  the 
rolling  forces  and  torques  during  cold  rolling  in  varying  frictional  conditions  is  the  main  facility  of  the  model. 
Additionally,  the  predictive  capability  of  the  neural  network  has  a  great  advantage  over  the  classical  approach, 
based  on  the  FEM  modeling,  due  to  the  speed  of  the  ANN  model.  Therefore,  the  ANN  model  can  be  easily 
used  in  the  design  of  new  rolling  technology,  as  well  as  in  on-line  control  systems  of  the  rolling  process. 


ALUMINUM  COLD  ROLLING 

Cold,  flat  rolling  of  aluminum  is  usually  performed  in  either  the  boundary  or  the  mixed  lubrication  regimes 
in  which  some  metal-to-metal  contact  occurs  in  addition  to  pockets  of  lubricants  that  separated  the  roll  and 
the  rolled  metal.  In  those  two  regimes  control  of  friction  is  paramount  and  this  is  achieved  by  the 
appropriate  choice  of  boundary  additives. 

Matsui  et  al.  [1]  used  a  paraffin-based  oil  mixed  with  lauryl  alcohol,  lauric  acid  and  methyl  laureate,  at  a 
5%  (v/v)  concentration.  In  this  study,  the  preferable  additive  for  rolling  of  aluminum  was  the  alcohol. 
Kihara  [2]  evaluated  the  effects  of  butyl  laureate,  lauric  acid,  and  lauryl  alcohol,  at  5%  (v/v),  in  a  low 
viscosity  paraffin-based  oil.  Friction  was  lowest  with  the  alcohol.  Nautiyal  and  Schey  [3]  observed  that 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


544 


both  lauric  and  oleic  acids  are  ineffective  but  stearic  acid  is  able  to  lower  friction  significantly.  Stearyl 
alcohol,  useless  at  1%,  gave  significant  improvement  at  5%. 

Williams  [4]  writes  that  in  boundary  lubrication  regimes,  where  film  thickness  is  only  a  few  molecules 
deep,  viscosity  and  density  of  the  lubricant  have  less  effect  on  frictional  resistance  than  either  its  chemical 
composition  or  properties  of  the  rolled  metal.  The  term  "oiliness"  is  used  to  define  the  lubricating 
properties  of  the  film.  SAE  defines  oiliness  as  signifying,  "differences  in  friction  greater  than  can  be 
accounted  for  on  the  basis  of  viscosity. . .  ".  Boundary  additives  of  long  chain  molecules,  typically  alcohol  or 
acids,  assist  base  oils  in  forming  films  which  adsorb  on  the  surface.  The  additive  concentration  should  be 
sufficient  to  cover  the  necessary  surfaces  but  should  not  affect  the  bulk  attributes  of  the  base  oil. 

Material,  Equipment  and  Procedure 

Equipment:  Experiments  were  carried  out  on  a  two-high  rolling  mill,  with  rolls  of  254  mm  diameter  by  1 00 
mm  in  length.  The  rolls  were  made  of  D2  tool  steel,  hardened  to  =  54.  Surface  roughness  was  0.18  pm.  The 
mill  was  powered  by  a  30  kW,  constant  torque,  DC  motor  through  a  Mack  truck  transmission,  with  continuous 
variable  speed.  The  mill  was  instrumented  with  two  load  cells  placed  under  the  bearing  blocks  of  the  bottom 
roll,  two  torque  transducers  placed  on  the  drive  spindles,  one  shaft  encoder  and  an  LVDT  to  monitor  roll  gap. 

Material :  The  strip  material  was  1 100  H14  aluminum,  containing  0.05%  Mn,  1%  Si,  0.05  -  0.2%  Cu  and  0.1% 
Zn.  The  samples  were  1  mm  x  25  mm  x  300  mm  long,  cut  parallel  to  the  original  direction  of  rolling.  The  plane 

strain,  true  stress  -  true  strain  curve  of  the  metal  is  given  by  G  =  110.9(1  +  lOOe)0'11,  MPa. 

Lubricant:  Mineral  seal  oil  was  used  as  the  base  and  the  effect  of  increasing  concentrations  of  boundary 
additives  -  lauric  acid,  stearic  acid,  lautyl  alcohol  and  stearyl  alcohol  -  was  examined.  The  average  density 
was  850  kg/m3  and  the  kinematic  viscosity  was  4.5  mm2/s,  at  40°C.  The  additive  concentration  was  varied 
from  1-5%,  (v/v).  These  amounts  did  not  change  the  effective  viscosity  or  density  in  a  significant  manner. 

Procedure:  Before  each  pass,  the  rolls  and  specimens  were  degreased  with  n -heptane,  a  neutral  cleaner,.  Ten 
drops  of  lubricant  were  spread  uniformly  on  all  surfaces  of  the  specimens.  After  rolling,  the  strips  were 
degreased  again.  The  independent  variables  are  the  reduction,  the  rolling  speed  and  the  additive  concentration. 


FEM  Simulation 

Development  of  a  method  to  evaluate  the  friction  coefficient  for  various  lubricants  was  one  of  the 
objectives  of  the  project.  Comparison  of  measurements  with  calculations  performed  assuming  various 
friction  coefficients,  has  been  made.  All  experiments  were  simulated  using  the  finite-element  technique. 
Detailed  description  of  the  applied  finite-element  model,  is  given  in  [5,6J.  Figure  1  shows  typical  results, 
which  were  obtained  for  3%  stearyl  alcohol  by  volume,  with  mineral  seal  oil  as  lubricant.  Numerical  results 
are  compared  with  the  measurements  performed  for  various  roll  velocities. 


Fig.  1.  Calculated  rolling  force  (a)  and  roll  torque  (b)  compared  with  the  experimental  data  obtained  for 
various  roll  velocities;  lubricant  3%  stearyl  alcohol. 


545 


Analysis  of  the  results  obtained  for  the  four  lubricants  with  different  additive  concentrations  should  allow 
evaluation  of  the  friction  coefficient  relevant  for  each  particular  lubricant.  However,  scatter  of  the 
experimental  measurements  (see  Figure  1)  caused  problems  with  interpretation  of  the  results.  Therefore,  an 
attempt  was  undertaken  to  apply  an  artificial  neural  network  to  overcome  these  difficulties. 


ARTIFICIAL  NEURAL  NETWORK  APPROACH 

Artificial  neural  networks  have  become  powerful  tools  to  simulate  and  control  various  processes.  Numerous 
examples  of  applying  ANN  to  metal  forming  are  found  in  the  scientific  literature.  Among  the  many 
publications,  those  dealing  with  control  of  rolling  mills  [7,8]  as  well  as  with  prediction  of  yield  strength  in 
plate  mills  [9],  rolling  loads  [10,1 1,12]  and  plate  bending  in  asymmetrical  rolling  [13]  should  be  mentioned. 

Our  main  goal  was  to  establish  a  relation  between  lubricant  additive,  its  concentration  and  the  rolling 
conditions.  Because  there  is  no  mathematical  model  of  such  relationships,  our  aim  was  to  apply  an  artificial 
neural  network  approach  to  solve  the  problem.  The  trained  ANN  should  predict  forces  and  torques  during  cold 
rolling  under  different  lubrication  conditions.  The  neural  network  was  trained  using  experimental  data, 
consisting  of  load  and  torque  measurements  during  cold  rolling  of  Al-alloy  specimens.  The  rolling  process 
was  performed  using  different  lubrication  conditions,  as  described  above.  The  input  variables  were: 

•  the  type  of  lubricant  added, 

•  the  lubricant  concentration 

•  the  rolling  speed 

•  the  reduction. 

The  training  data  set  included  experimental  data  of  all  rolling  tests  with  different  lubricants  and  different 
additive  concentrations,  except  data  for  rolling  with  3%  concentration  of  lubricants.  These  data  were  reserved 
to  test  the  trained  network.  The  outputs  of  the  ANN  were: 

•  rolling  force, 

•  rolling  torque. 

Different  network  topologies  were  tested  during  training.  The  best  results  were  obtained  for  a  network  with 
one  hidden  layer  of  20  neurons.  Thus,  the  final  network  topology  was  4-20-2.  The  trained  network  was  then 
tested  using  data  from  outside  the  training  data  set,  i.e.,  results  of  rolling  with  3%  concentration  of  additives. 


a) 


E 

E 


4000 


3000 


2000 


D> 

|  1000 
or 


■£'  + 


b) 


ill!; 

0  1000  2000  3000  4000 

Rolling  force  -  measurements,  N/mm 


30-i 


E 
E 
E 

2.  20  H 


<1> 

3 

O' 


2  10 


CD 

C 

o 
i X. 


A 


+.* 


0  10  20  30 

Rolling  torque  -  measurements,  Nm/mm 


Fig.  2.  Comparison  of  measured  and  predicted  (by  ANN)  values  of  forces  (a)  and  torques  (b)  of  rolling  with  3% 

concentration  of  the  stearic  acid  additive. 


546 


The  comparison  of  measured  and  predicted  results  by  ANN,  for  values  of  force  and  torque  of  rolling  with  3% 
concentration  of  stearic  acid  additive  are  presented  in  Figure  2.  Good  agreement  between  the  measurements 
and  predictions  of  the  artificial  neural  network  is  observed.  Similar  agreement  was  obtained  for  rolling  with 
3%  concentration  of  the  other  additives,  i.e.  all  of  the  unseen  data  showed  excellent  comparison. 

The  next  step  of  the  validation  procedure  was  to  compare  the  values  predicted  by  the  ANN  model  with  the 
results  of  FEM  calculations.  Typical  results  obtained  for  the  3%  concentration  of  lauryl  alcohol  additive  are 
shown  in  Figure  3.  Thick  dotted  and  solid  lines  in  this  figure  represent  predictions  of  the  artificial  neural 
network  for  20  rpm  and  100  rpm,  respectively.  The  network  smoothed  the  experimental  results  and  allowed 
the  conclusion  that  an  increase  in  rolling  velocity  decreases  the  friction  coefficient.  This  phenomenon  was 
observed  for  all  tested  lubricants.  Comparison  of  the  predicted  curve  shapes  by  the  finite  element  program 
and  by  the  artificial  neural  network  reveal  some  differences.  The  rolling  force  calculated  by  the  FEM  code 
increases  slightly  lower  than  that  predicted  by  the  ANN  as  a  function  of  reduction. 

4  ~!  measurements: 

|  — + —  20  rpm 

| ;  — X —  40  rpm  i  e, 


reduction 

Fig.  3.  Results  of  measurements,  FEM  calculations  and  predictions  by  ANN  for  3%  lauryl  alcohol  additive. 

Analysis  of  all  results  show  that  a  combination  of  finite  element  simulation  with  neural  network  prediction 
is  a  good  approach  to  evaluate  the  friction  coefficient.  The  neural  network  eliminates  scattering  of  the 
experimental  results  and  yields  a  monotonous  relationship  between  loads  and  reduction.  Comparison  of 
ANN  representation  of  the  results  with  FEM  calculations  for  various  friction  coefficients  allows  this 
coefficient  to  be  chosen  more  accurately. 


CONCLUSIONS 

The  ability  of  the  accurate  prediction  of  the  rolling  forces  and  torques  during  cold  rolling  in  vaiying  frictional 
conditions  is  the  main  advantage  of  the  model.  Additionally,  the  predictive  capability  of  the  neural  network 
has  a  great  advantage  over  the  classical  approach  based  on  the  FEM  modeling,  due  to  the  speed  of  the  ANN 
model.  The  artificial  neural  network  was  validated  for  the  data,  which  were  not  used  during  the  training 
procedure.  The  validation  confirmed  the  good  predictive  ability  of  the  model.  Therefore,  the  ANN  model  can 
be  easily  used  in  the  design  of  new  rolling  technology,  as  well  as  in  on-line  control  systems  of  the  rolling 
process.  The  artificial  neural  network,  developed  in  the  present  work,  was  implemented  in  the  computer 
program,  which  designs  rolling  schedules  for  a  reverse  four-high  mill. 

ACKNOWLEDGEMENT 

Financial  support  of  KBN  (project  AGF1  No  11.11.110.16),  NATO  and  NSERC  is  gratefully 
acknowledged. 


547 


REFERENCES 

1.  Matsui,  K.,  Matsushita,  T.,  Takatsuka,  K.,  Yamaguchi,  Y.,  1984.  Advanced  Techn.  of  Plasticity,  1, 
247. 

2.  Kihara,  J.,  1990.  Advanced  Techn.  of  Plasticity,  4,  1693. 

3.  Nautiyal,  P.C.,  Schey,  J.A.,  1990.  ASME  J.  Tribol.,  1 12,  282. 

4.  Williams,  J.A.,  1994.  Engineering  Tribology,  Oxford  University  Press,  Oxford. 

5.  Pietrzyk,  M.,  Lenard,  J.G.,  1991.  Thermal-Mechanical  Modelling  of  the  Flat  Rolling  Processes, 
Springer-Verlag,  Berlin. 

6.  Pietrzyk,  M.,  1992.  Metody  numeryczne  w  przerobce  plastycznej  metali,  skrypt  AGH1303,  Krakow, 
(in  Polish). 

7.  Roscheisen,  M.,  Hoftnann,  R.,  Tresp,  V.,  1992.  in  Advances  in  Neural  Information  Processing  Systems  4, 
(M.  Kaufman,  ed.),  659. 

8.  Too,  J.J.M.,  Ide,  K.,  Maheral,  P.,  Pussegoda,  N.,  Sherwood,  E.G.,  Gomi,  T.,  1995.  37th  Mechanical 
Working  and  Steel  Processing  Conference,  Hamilton,  1995,  555. 

9.  Tsoi,  A.C.,  1992.  in  Advances  in  Neural  Information  Processing  Systems  4,  (M.  Kaufman,  ed.),  698. 

10.  Hwu,  Y.J.,  Lenard,  J.G.,  1995.  37th  Mechanical  Working  and  Steel  Processing  Conf.,  Hamilton,  549. 

11.  Larkiola,  J,  Myllykoski,  P.,  Nylander,  J.,  Korhonen,  A.S.,  1996.  Metal  Forming96,  (M.  Pietrzyk,  J. 
Kusiak,  P.  Hartley,  I.  Pillinger,  eds),  Krakow,  J.  Material  Processing  Technology,  60,  381. 

12.  Wiklund,  O.,  1996.  Steel  Strip96,  Opava,  136. 

13.  Kusiak,  J.,  Pietrzyk,  M.,  Wilk,  K.,1997.  KomPlasTech97,  (A.  Piela,  J.  Kusiak,  M.  Pietrzyk,  eds),  Ustron- 
Jaszowiec,  207  (in  Polish). 


548 


549 


Direct  Determination  of  Sequences  of 
Passes  for  the  Strip  Rolling  Process 
by  Means  of  Fuzzy  Logic  Rules 


C.D.M.  Pataro.*  and  H.  Helman**. 

*  Departamento  de  Engenharia  Eletronica,  Escola  de  Engenharia  da  UFMG 
Av.  Antonio  Carlos,  6627,  Pampulha,  Belo  Horizonte,  MG 
**  Departamento  de  Engenharia  Metalurgica,  Escola  de  Engenharia  da  UFMG 
Rua  Espirito  Santo,  35,  Centro,  Horizonte,  MG,  Brazil 


ABSTRACT 

In  this  work  the  direct  determination  of  sequences  of  passes  for  the  strip  rolling  process  by  means  of  fuzzy 
logic  rules  is  presented.  The  variables  Rolling  Load,  Accumulated  Deformation  and  Aimed  Deformation 
are  expressed  in  linguistic  terms,  such  as  high,  medium  and  low  and  their  corresponding  sublevels.  These 
rules  can  be  established  in  agreement  with  data  obtained  from  theoretical  models,  rolling  of  samples  of  the 
strip  or  by  means  of  available  databases  for  this  particular  material  and  operational  conditions.  In  the  last 
two  cases,  the  procedure  becomes  independent  of  the  knowledge  of  mechanical  and  metallurgical 
characteristics  of  the  process. 

INTRODUCTION 

The  method  to  determine  sequences  of  passes  for  the  strip  rolling  process  usually  involves  an  iterative 
process  [1,3,4].  This  can  be  a  very  time  consuming  procedure,  even  if  a  fast  computer  is  used.  A  direct 
determination  of  the  sequences  of  passes  was  developed  by  Pataro,  Resende  and  Helman  in  1 994,  using 
Neural  Networks  [8,11],  thus  speeding  up  these  calculations.  As  the  Neural  Network,  the  Fuzzy  Logic 
[5,6,12,13]  also  showed  to  be  adequate  for  the  determination  of  loads  and  gaps  in  the  rolling  process  [9,10], 
In  this  work  a  method  for  the  determination  of  the  sequences  of  passes  based  on  Fuzzy  Logic  rules  is 
described.  These  rules  may  be  derived  from  a  database,  as  the  curves  shown  in  Figure  1. 

The  relationship  ‘foiling  load  -  deformation”  varies  significantly  with  large  changes  in  operational 
variables  such  as  friction,  strip  width,  yield  stress,  and  entry  thickness.  These  variations  are  not  linear;  they 
not  only  alter  the  slope  but  also  cause  a  displacement  of  the  mention  curves.  Several  theoretical  models 
permit  the  calculation  of  this  load  [3]. 

Families  of  curves  as  a  function  of  logarithmic  deformations  are  produced  with  that  aim.  Using  the 
Accumulated  Deformation  (s;)  as  a  parameter,  such  curves  show  the  relationship  between  the  Rolling  Load 
(P)  and  Deformation  (sO.  These  variables  must  be  obtained  for  the  same  material,  with  a  constant  width  and 
initial  thickness,  rolled  according  to  an  adequate  accumulated  deformation  schedule.  The  curve  relative  to 
8j  should  also  be  plotted  for  the  various  desired  deformation  values,  starting  at  the  value  of  hj;  and 
according  to  the  accumulated  deformation  ej.  These  curves  can  be  obtained  by  using  experimental  values  or 
theoretical  models. 

Assuming  that  e,  <  e  2  <  e  j  <  en  in  Figure  1,  a  curve  with  a  deformation  e0,  represents  the  behaviour  of  the 
strip  in  the  original  form,  as  received  to  be  rolled.  This  condition  can  be  the  result  of  a  full  annealing 
process  or  an  already  hardened  material,  with  an  unknown  history  of  deformations. 

For  a  direct  calculation  of  the  sequence  of  passes,  assuming  to  be  working  with  logarithmic  deformations 
(that  permit  to  be  added  up  to  a  final  desired  deformation)  and  supposing  that  the  final  deformation  e5,  is 
objectively  derived,  the  following  procedure  should  be  carried  out: 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


550 


1  -  Search  for  the  deformation  value  so  that  the  maximum  rolling  load  is  not  exceeded.  This  is  achieved 
looking  for  the  curve  with  the  accumulated  deformation,  e0„  at  the  point  where  the  rolling  load  is  at  the 
maximum,  Pma5.  This  point  defines  the  logarithmic  deformation  e4  (point  B  of  Figure  1 ). 


Fig.  1.  Typical  aspect  of  a  family  of  curves  “Rolling  Load  -  Deformation”,  using  Accumulated 
Deformation  as  a  parameter  .It  describes  the  procedure  for  the  direct  calculation  of  a  pass  sequence. 


2  -  Observe  the  deformation  still  required  to  achieve  the  final  objective,  or: 

8r  =  £3  ~  £5  “  £4 

3  -  Check  if  it  is  possible  to  apply  that  deformation  without  exceeding  the  maximum  load.  This  verification 
should  be  done  on  the  curve  with  an  accumulated  deformation  e4f  >  since  the  material  is  now  in  this 
condition.  In  this  case,  point  C,  which  is  higher  than  the  acceptable  maximum  load  is  found. 

4  -Return  to  step  1,  now  looking  for  the  desired  deformation  value  corresponding  to  Pma,  ,  on  the  curve 
with  the  value  of  e4  for  the  accumulated  deformation. 

5  -Repeat  the  step  if  the  final  desired  deformation  has  not  achieved.  In  Figure  3,  the  steps  e4,  e2  e  e4  ,  which 
correspond  to  the  sequence  of  steps  necessary  to  accomplish  the  final  deformation  s5  are  seen  Check  that 
the  sum  of  the  segments  ab,  cd  and  ef  correspond  to  the  deformation  e5  . 


551 


ELABORATION  OF  PASS  SEQUENCES  USING  FUZZY  LOGIC  RULES 

The  rules  corresponding  to  the  following  example  were  produced  from  the  data  shown  in  Figure  2,  based 
on  Bland-Ford  model  [2],  for  a  100mm  width  steel  with  the  following  parameters  of  Ludwikk  equation:  A 
=  9.6  kgf/mm2;  B  =  75  kgf/mm2  and  t|  =  0.3. 


Fig.2.  Curves  used  to  establish  the  fuzzy  logic  rules 
for  the  calculation  of  sequences  of  passes. 

According  with  the  method  of  the  direct  determination  for  the  sequences  of  passes,  described  in  the  first 
part  of  this  work,  the  accumulated  and  aimed  deformation  will  be  used  to  determine  the  rolling  load.  The 
rolling  load  and  the  accumulated  deformation  will  be  used  to  obtain  the  aimed  deformation  for  each  pass. 
The  accumulated  deformations  are  classified  in  linguistic  terms,  as  follow:  Super  -  Low  (SL),  Low,  (L), 
Medium  (M),  High  (H)  and  Super-  High  (SH).  The  aimed  deformations  are  classified  as  follows:  Super  - 
Low  (SL),  Medium-  Low  (  ML),  Low,  (L),  Medium-Medium  (MM),  Medium  (M),  Medium  -High  (MH), 
High  (H),  High-High  (HH)  and  Super-  High  (SH).  The  rolling  load  received  the  classification  ranging  from 
Cl  to  CIO.  Figures  3,  4  and  5  show  the  membership  function  ( (i)  for  these  variables. 


Fig.  3.  Membership  Functions  for  accumulated  deformation 


552 


4 


Fig.  4.  Membership  functions  for  aimed  deformation 


4 


Fig.  5.  Membership  functions  for  rolling  load 

Table  1  describes  the  rules  that  determine  the  rolling  load  as  a  function  of  the  accumulated  deformation 
and  desired  deformation.  ACD  is  the  accumulated  deformation;  AID  is  the  aimed  deformation  and  RLO, 
the  rolling  load. 


Table  1:  Rules  used  for  the  determination  of  the  rolling  load, 
as  a  function  of  the  accumulated  deformation  and  desired  deformation. 


ACD 

AID 

SL 

L 

M 

H 

SH 

SL 

Cl 

C2 

C2 

C3 

C3 

ML 

C2 

C3 

C3 

C4 

C4 

L 

C3 

C4 

C4 

C4 

C5 

MM 

C4 

C5 

C5 

C5 

C6 

M 

C5 

C6 

C6 

C6 

Cl 

MH 

C6 

C6 

Cl 

Cl 

C8 

H 

Cl 

Cl 

C8 

C8 

C9 

HH 

Cl 

C8 

C9 

C9 

CIO 

SH 

C8 

C9 

C9 

CIO 

CIO 

553 


Table  2  :  Rules  used  for  the  final  deformation,  as  a  function  of  the 
accumulated  deformation  and  desired  deformation 


ACD 

AID 

SL 

L 

M 

H 

SH 

SL 

SB 

B 

M 

A 

SA 

ML 

MB 

MM 

MA 

AA 

SA 

L 

B 

M 

A 

SA 

SA 

MM 

MM 

MA 

AA 

SA 

SA 

M 

M 

A 

SA 

SA 

SA 

MH 

MA 

AA 

SA 

SA 

SA 

H 

A 

SA 

SA 

SA 

SA 

HH 

AA 

SA 

SA 

SA 

SA 

SH 

SA 

SA 

SA 

SA 

SA 

Table  3:  Rules  used  for  the  determination  of  the  desired  deformation, 
as  a  function  of  accumulated  deformation  and  rolling  load 


ACD 

RLO 

SL 

L 

M 

H 

SH 

C2 

SL 

SL 

SL 

SL 

SL 

C3 

L 

SL 

SL 

SL 

SL 

C4 

M 

MM 

L 

SL 

SL 

C5 

MH 

M 

MM 

M 

L 

C6 

H 

MH 

M 

M 

MM 

Cl 

HH 

H 

H 

MH 

MH 

C8 

SH 

HH 

HH 

HH 

H 

C9 

SH 

SH 

SH 

SH 

HH 

CIO 

SH 

SH 

SH 

SH 

SH 

RESULTS 

A  computing  programme  was  elaborated  in  accordance  with  the  exposition  made  in  the  previous  sections. 
Several  sequences  of  passes  for  different  desired  deformations  were  simulated.  As  it  was  shown,  the  rules 
employed  are  adequate  for  the  given  material,  with  the  mechanical  characteristics  presented  in  Figure  2, 
but  they  can  also  be  used  for  similar  materials. 

Table  4:  Sequences  of  passes  for  maximum  rolling  load  of  60 1, 
desired  deformation  of  40%.  Obtained  deformation:  39%. 


Pass 

AID(%) 

RLO(t) 

1 

26.0 

60 

2 

13.0 

45 

Table  5:  Sequences  of  passes  for  maximum  rolling  load  of  50 1, 
desired  deformation  of  30%.  Obtained  deformation:  29%. 


Pass 

AID(%) 

RLO(t) 

1 

21.0 

50 

2 

8.0 

40 

554 


Table  6:  Sequences  of  passes  for  maximum  rolling  load  of  50 1, 
desired  deformation  of  40%.  Obtained  deformation:  40.6  %. 


Pass 

AID(%) 

RLO(t) 

1 

21.0 

50 

2 

13.5 

50 

3 

6.1 

38 

Table  7:  Sequences  of  passes  for  maximum  rolling  load  of  70 1, 
desired  deformation  of  40%.  Obtained  deformation:  40.6  %. 


Pass 

AID(%) 

RLO(t) 

1 

33.5 

70 

2 

7.1 

40 

CONCLUSION 

The  method  developed  is  suitable  to  determine  sequences  of  passes  in  flat  rolling,  in  accordance  with  the 

simulation.  The  maximum  error  found  in  the  example  is  less  than  3.0%,  in  the  final  desired  deformation 

and  it  can  be  reduced,  by  increasing  the  number  of  linguistic  terms.  The  application  of  this  procedure  in 

areas  where  materials  with  similar  characteristics  are  normally  rolled,  may  be  extremely  useful. 

REFERENCES 

1 .  Avila,  A.  F.,  1 998.  Otimizagao  da  Produtividade  de  um  Laminador  Tandem  a  Frio.  Tese  de  Mestrado. 
EEUFMG.  Depto.  de  Engenharia  Metalurgica.  Belo  Horizonte,  Novembro. 

2.  Bland,  D.R.,  Ford,  H.,  1948.  The  Calculation  of  Roll  Force  and  Torque  in  Cold  Strip  Rolling  with 
Tensions.  Proc.  Inst.  Mech.  Eng.,  159,  144-163. 

3.  Helman,  H.  et  al.,  1988.  Fundamentos  da  LaminagSo  -  Produtos  Pianos.  Publicagao  ABM,  SP. 

4.  Helman,  H.,  Cetlin,  P.R.,  1983.  Fundamentos  da  Conformagao  Mecanica  dos  Metais.  Guanabara  Dois. 

Rio  de  Janeiro. 

5.  Jamshidi,  M.,  Vadiee,  N.,  Ross,  T.,  1993.  Fuzzy  Logic  and  Control  -  Software  and  Hardware 
Applications.  PTR, Prentice  Hall,  Englewood  Cliffs,  New  Jersey 

6.  Klir,  G.  J.;  Folger,  T.  A.,  1988.  Fuzzy  Sets,  Uncertaint  and  Information,  Prentice-Hall,  New  York. 

7.  Pataro,  C.D.M.,  Helman,  H.,  1997.  Determinagao  Direta  de  Sequencias  de  Passes  na  Laminagao  de 
Produtos  Pianos,  Mantendo  Constante  a  Carga  de  Laminagao.  Anais  do  II  Congresso  Intemacional  de 
Tecnologia  Metalurgica  e  de  Materiais.  Sao  Paulo,  SP. 

8.  Pataro,  C.D.M.,  1996.  Execugao  Automatica  do  Processo  de  Laminagao,  Utilizando  Redes  Neurais.  Tese 
de  Doutorado,  UFMG. 

9.  Pataro,  C.D.M;  Resende,  P.,  Helman,  H.,  1995.  Aplicagao  de  Logica  Nebulosa  na  Laminagao  de 
Produtos  Pianos.  Anais  do  50°  Congresso  Anual  da  ABM.  pp.  395-404,  Sao  Pedro,  Sao  Paulo. 

10.  Pataro,  C.D.M,  Helman,  H.,  1997.  Determinagao  de  berturas  entre  Cilindros  de  Laminagao,  Via  Logica 
Nebulosa.  Anais  do  XIV  Congresso  Brasileiro  de  Engenharia  Mecanica,  Bauru,  SP. 

11.  Pataro,  C.D.M;  Resende,  P.,  Helman,  H.,  1994.  Geragao  Automatica  de  uma  Sequencias  de  Passes  na 
Laminagao  de  Produtos  Pianos.  Anais  do  Congresso  Intemacional  de  Tecnologia  Metalurgica  Sao 
Paulo,  SP. 

12.  Zadeh,  L.A.,  1965.  ‘Fuzzy  sets”,  Info.  &  Control.,  8,  338-353. 

13.  Zadeh,  L.  A.,  1973.  Outline  of  a  new  approach  to  the  analysis  of  complex  systems  and  decision  process, 
IEEE  Trans.  Syst.,  Man,  Cybem.,  3,  28-44. 


555 


Elongation-Control  Rolling  of  H-Shaped  Wire 


H.  Utsunomiya,  Y.  Salto,  M.  Shlnkawa  and  F.  Shimaya 


Division  of  Materials  Science  and  Engineering, 
Graduate  School  of  Engineering,  Osaka  University, 
2-1  Yamada-oka  Suita,  Osaka,  565-0871,  Japan 


ABSTRACT 

The  authors  propose  a  novel  technique  for  size-free  rolling,  termed  elongation-control  rolling.  This 
technique  is  characterized  by  the  active  use  of  interstand  forces,  which,  being  not  only  tensile  but  also 
compressive,  are  varied  over  a  wide  range.  In  the  present  study,  H-shaped  wires  are  formed  from  round 
wires  by  elongation-control  rolling  with  grooved  rolls.  The  elongation,  i.e.  nominal  strain  in  the  rolling 
direction,  is  controlled  over  a  wide  range,  from  20%  to  80%,  where  50%  is  tension-free  rolling.  The  flange 
width  increases  with  compressive  interstand  force,  and  decreases  with  tensile  force.  It  is  concluded  thatthe 
rolling  of  H-shaped  wires  having  an  arbitrary  flange  width  is  made  possible. 


INTRODUCTION 

In  the  continuous  rolling  of  bars,  rods,  and  sections,  the  forces  acting  on  materials  between  rolling  stands 
(interstand  forces)  must  be  negligibly  small  (tension-free)  in  order  to  obtain  dimensional  accuracy  and 
stable  operation.  The  authors  showed  that  interstand  compressive  forces  promote  metal  flow  and  enable 
effective  forming  of  complicated  cross-sections  [1,2].  The  authors  propose  a  novel  rolling  technique  termed 
"elongation-control  rolling"  [3,4],  This  method  can  generate  an  interstand  force  with  a  capacity  to  vary 
widely  from  tensile  to  compressive.  Thus  the  total  elongation  or  the  product  size  can  be  controlled,  and 
size-free  rolling  is  made  possible.  The  fundamental  characteristics  of  rolling  flat  strips  were  investigated 
[3].  The  width  of  flat  wires  can  be  closely  controlled  [4],  In  the  case  of  square  wire,  an  arbitrary 
combination  of  side  length  and  comer  radius  was  obtained  by  the  elongation-control  rolling  technique  [5], 

In  this  study,  elongation-control  rolling  was  applied  to  roll  H-shaped  wires.  H-shaped  wires  are  widely  used 
as  piston  rings  or  rails  in  electric  parts.  H-shaped  wires  of  arbitrary  flange  widths  were  successfully  rolled 
by  the  proposed  method. 


PROTOTYPE  MILL 

A  schematic  illustration  of  a  constructed  prototype  elongation-control  mill  is  shown  in  Figure  1.  Details  are 
provided  in  references  [1,2].  The  mill  consists  of  five  cassette  type  stands.  Each  stand  was  equipped  with 
two-high  100mm  diameter  rolls,  load  cells  for  roll  separating  forces,  a  torque  meter,  and  a  tachometer. 
Each  stand  was  independently  driven  at  a  prescribed  speed  by  a  2.2kW  servomotor. 

All  stands  were  mounted  on  a  pair  of  slide  guides  (linear  guides)  and  could  move  smoothly  in  the  rolling 
direction.  Load  cells  were  inserted  between  the  adjacent  stands,  so  that  the  interstand  force  was  measured 
directly  and  precisely.  The  stands  were  all  pushed  against  each  other  while  affording  an  adequate  pre-load 
from  the  entrance  and  exit  stays.  The  distance  between  adjacent  roll  axes  was  230mm.  The  pair  of  guide 
shoes  and  side  guides  were  set  close  to  the  wire.  These  guides,  by  preventing  the  wire  from  buckling  or 
meandering,  enable  stable  rolling  in  the  case  of  large  compressive  interstand  force. 

EXPERIMENTAL  PROCEDURE 

Rolling  experiments  were  performed  at  ambient  temperature  using  round  wires  of  three  different  materials, 
including  fully  annealed  aluminum  JIS  A1 070-0,  partially  hardened  pure  C3102,  and  medium  carbon  steel 
S45C.  All  wires  were  5mm  in  diameter  and  2m  in  length.  Prior  to  rolling,  the  wires  were  preformed  into 
round-edged  flat  wires  by  one-pass  flat  rolling.  The  width  of  preformed  wires  was  controlled  to  5.9  mm,  to 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


556 


fit  the  groove  on  the  roll  and  to  prevent  meandering.  The  thickness  of  the  preformed  wire  was  3.3  mm  for 
aluminum,  3.6mm  for  copper,  and  3.5mm  for  carbon  steel. 


Stock 

Side  guide  Roll 

rp  f'c.  pc-  ro;  pc. 

Product 

.  * - j 

; - f 

i 

- 1 

Ml 

I1 - 

Offset 

*1  H  H  H  H  ► 

No.l  No. 2  No. 3  No. 4  No. 5 

Offset 

load 

load 

Plan  view 


Stay  Guide  shoe  l.narioplls  Rnll 


Front  view  slide  bearing 

Fig.  1.  Schematic  illustration  of  the  elongation-control  mill. 


Rolling  stands  were  set  in  H-H  (horizontal)  arrangement.  The  roll  passes  used  are  shown  in  Figure  2. 
Closed  roll  passes  were  used  from  the  first  to  the  fourth  stands.  The  grooves  had  a  2  degree  draft  angle.  The 
reduction  in  web  thickness  at  each  stand  was  20%  and  total  reduction  was  60%.  The  fifth  stand  was  used  as 
an  edger;  the  tops  of  flanges  were  shaped  by  symmetrical  open  pass.  Edging  draft  (the  reduced  amount  of 
rib  height  or  flange  width)  was  varied  0.25  and  0.5  mm.  Rolling  experiments  were  done  at  ambient 
temperature  under  conditions  of  lubrication  with  mineral  oil  (IDEMITSU  CU-50). 

The  interstand  forces  were  generated  by  prescribing  the  roll  speeds  on  each  stand  as  follows:  first,  the  roll 
speed  at  the  first  stand,  v” ,  was  fixed  at  lm/min,  and  the  roll  speed  at  z'-th  stand,  V°  (z=2,..5)(which 
established  the  tension  free  rolling),  was  experimentally  determined  from  upstream  stand  by  stand.  For 
rolling  with  interstand  forces,  the  roll  speed  of  z'-th  stand  V(.  was  set  as 


Fig.  2.  Roll  passes  used. 


557 


where  the  parameter  a  is  an  arbitrary  constant  indicating  the  degree  of  imbalance  of  roll  speeds  which 
determines  interstand  forces.  This  parameter  will  be  referred  to  as  the  roll  speed  parameter.  When  a=  1,  v, 
equals  v°h  and  tension-free  rolling  is  achieved.  Ifo>l,  v/v,-_/  is  larger  than  v°/v°uh  and  tensile  forces  must  be 
generated  between  adjacent  stands.  Conversely,  if  a<  1,  compressive  forces  must  be  generated.  The 
prescribed  speeds  in  the  case  of  aluminum  are  shown  in  Figure  3. 


Stand  number,  i 

Fig.  3.  Roll  speeds  at  each  stand. 

During  the  rolling  test,  the  output  signals  of  the  interstand  forces,  the  roll  forces,  and  the  roll  torques  were 
measured  at  a  frequency  of  10Hz.  The  rolling  operation  was  interrupted  after  a  sufficient  distance  of  steady 
state  rolling  had  been  performed.  Then  the  elongation  (longitudinal  nominal  strain)  and  the  cross-section  at 
each  stand  were  measured. 

RESULTS  AND  DISCUSSIONS 

The  rolling  limits 

In  the  case  of  a>  1,  rib  height  was  insufficient  and  edging  could  not  be  applied.  The  results  of  rolling 
experiments  without  the  fifth  stand  were  determined  in  this  case.  There  were  upper  limits  on  the  roll  speed 
parameter  a,  which  were  determined  by  the  slippage  between  rolls  and  wires.  The  upper  limit  was  a=1.14 
for  aluminum,  a=1.12  for  copper,  and  a=1.04  for  carbon  steel.  In  the  case  of  a<  1,  lower  limits  were 
determined  by  the  web-buckling  of  wires.  The  lower  limit  was  0.88  for  both  aluminum  and  copper,  and 
0.92  for  carbon  steel.  Therefore,  stable  rolling  conditions  were  0.90-  1.12  for  aluminum,  0.90  -  1.10  for 
copper,  and  0.92  - 1 .04  for  carbon  steel. 

Cross-sectional  profiles 

The  variation  in  cross-sections  of  aluminum  wire  in  the  cases  of  a=0.90,  1.00,  and  1.12  respectively  are 
compared  in  Figure  4.  The  ribs  of  wires  grow  by  passing  rolling  stands.  The  smaller  a  causes  larger  ribs 
and  better  filling  of  the  roll  grooves.  In  the  case  ofa=1.12,  the  edging  could  not  be  applied  because  the  rib 
height  was  less  than  the  depth  of  the  grooves.  In  the  cases  ofa=l  .00  and  0.90,  edging  draft  e=0.25  resulted 
in  underfilling  at  comers.  Approximately  complete  filling  was  achieved  in  the  case  ofe=0.50,  though  slight 
overfilling  of  flanges  was  observed.  However,  the  flanges  were  nearly  parallel.  As  a  result,  it  can  be 
concluded,  that  elongation-control  rolling  can  closely  control  the  flange  width  of  H-sections. 

Elongation 

The  elongation  -  roll  speed  parameter  a  relationship  is  shown  in  Figure  5  in  the  case  of  e=0.25.  It  shows  an 
approximately  linear  relationship.  The  elongation  was  varied  over  a  wide  range,  from  20  to  80%,  where 
50%  is  tension-free  rolling.  The  effects  of  materials  and  that  of  edging  on  the  elongation  are  less  apparent. 


558 


1  j 

18.68mm  2  . 2mm 

o=0.90 

o=1. 00  o=1.12 

No.1 

!.  ' 

!  i 

i 

16.24mrrf 

16.23mm2 

15.81mnr^ 

No.2 

|  1 

i  ! 

1  1 

!  1 

i  ; 

”  15.56mm2'""" 

14.64mm2 

13.51mm2 

No.3 

i  j 

t 

!  ; 

13.53mm2 

11.42mn? 

No.4 

f  '  -1  i 

j 

1  1 

s  1 

1  i 

i 

1  ; 

1  :  j,  / 

!  .•  \  i 

12.10mm2 

9.92mm2  i 

No. 5 

e=0.25 

fi  P| 

i  '■  " 

| 

i  [ 

j  :  \ 

‘—'l  5.22mm21— - 

11.99mm2 

No.5 

<7=0.50 

n  >  i 

j  i 

|  \ 

u  u 

15.28mm2 

\ _ .  \  j 

1 1  88mm2 

Fig.  4.  Variations  in  cross-sections  as  a  function  of  roll  speed  parameter. 


Roll  speed  parameter  a 

Fig.  5.  Relationship  between  elongation  and  roll  speed  parameter. 


Interstand  forces 

The  distribution  of  interstand  forces  during  rolling  of  aluminum  wires  is  shown  in  Figure  6.  Interstand 
tensile  forces  are  generated  when  a  >  1;  compressive  forces  are  generated  when  a  <  1.  These  forces  exhibit 
a  concave  distribution.  Interstand  forces  at  intermediate  stands  highly  depend  on  the  roll  speed  parameters. 
In  the  case  of  a  >  1 ,  interstand  force  is  not  generated  between  the  fourth  and  the  fifth  stages  because  the 
edging  is  not  achieved  at  the  fifth  stand.  In  the  case  of  a  <  1 ,  the  interstand  force  between  the  fourth  and 
the  fifth  stands  is  generated  due  to  edging,  though  it  does  not  depend  on  a.  This  force  may  be  due  to 
slippage  at  the  fifth  stage.  It  is  found  that  the  edging  has  a  negligible  effect  on  the  roll  forces  at  upstream 
stages. 


559 


Fig.  6.  Interstand  forces  as  a  function  of  roll  speed  parameter. 


CONCLUSION 

In  this  study,  elongation-control  rolling  has  been  applied  to  the  rolling  of  H-shaped  aluminum,  copper,  and 
medium  carbon  wires,  respectively.  H-shaped  wires  were  rolled  from  round  wire  by  five-stand  grooved 
rolling.  The  elongation  varied  over  a  wide  range,  from  20%  to  80%,  where  50%  was  tension  free  rolling. 
Flange  width  increases  with  compressive  interstand  force,  and  decreases  with  tensile  force.  It  is  concluded 
that  the  rolling  of  H-shaped  wires  that  have  an  arbitrary  flange  width  is  made  possible  by  the  proposed 
technique. 


ACKNOWLEDGEMENT 

This  study  was  supported  by  the  Ministry  of  Education,  Science  and  Culture  of  Japan  under  grant  number 
(A)(2)-07505019. 


REFERENCES 

1.  H.  Utsunomiya,  Y.  Saito,  K.  Hirata,  T.  Kawamoto  and  K.  Oka,  1996.  Rolling  of  Profiled  wires  by  the 

satellite  mill,  Adv.  Tech,  of  Plasticity,  1,  99-102. 

2.  H.  Utsunomiya,  Y.  Saito,  T.  Kawamoto  and  H.  Matsuzawa,  1998.  Satellite-mill  rolling  of  U-shaped  and 

H-shaped  wires,  J.  Mat.  Proc.  Tech.,  80-81,  345-350. 

3.  Y.  Saito,  H.  Utsunomiya,  M.  Shinkawa  and  F.  Shimaya,  1998,  Development  of  an  elongation-control 
mill,  J.  Mat.  Proc.  Tech.,  80-81,  351-355. 

4.  Y.  Saito,  H.  Utsunomiya,  M.  Shinkawa  and  K.  Oka,  1998.  A  novel  rolling  technique  for  size-free  rolling, 

Proc.  7th  Int.  Conf.  on  Steel  Rolling,  811-815. 

5.  H.  Utsunomiya,  M.  Shinkawa  and  Y.  Saito,  1999.  Elongation-control  rolling  of  square  wire,  (to  be 

published  in  Proc.  6th  Int.  Conf.  on  Tech,  of  Plasticity). 


560 


561 


Application  of  a  Neural  Network  to  Speed  Up 
a  Mathematical  Model  to  Calculate  Strip  Profiles  in  Flat  Rolling 

Yukio  Shigaki,  Horacio  Helman 

Universidade  Federal  de  Minas  Gerais,  Department  of  Metallurgy  and  Materials, 

R.  Espirito  Santo  35,  30160-030  Belo  Horizonte,  MG,  Brazil 
Fax:(5531)238-1815  Email:  shigaki@cce.ufmg.br  .  hhelman@demet.ufing.br 


ABSTRACT 

Since  the  sixties,  innumerable  mathematical  models  [1,2, 3, 4, 5]  have  been  developed  to  simulate  flat 
rolling,  specifically,  to  determine  its  width-wise  profile.  These  models  have  great  utility  in  understanding 
the  influence  of  parameters  such  as  friction,  front  and  back  tension,  gap,  material  properties,  etc.  in  the 
rolling  of  flat  strips.  Among  these,  Pawelski,  Rasp  and  others  [4,5]  have  developed  a  precise  model  that 
accounts  for  the  influence  of  bending,  shearing  and  flattening  of  the  rolls  which  are  crucial  to  calculate 
emergent  strip-profiles  accurately. 

This  method  divides  the  strip  and  roll  into  many  stripes,  assuming  a  plane-strain  state.  For  each  stripe  of  the 
roll  and  strip,  the  load  and  deformation  respectively  are  calculated  using  an  analytical  approach  such  as  the 
Bland-Ford-Ellis  model  with  Hitchcock^  formula  for  deformed  radius  of  the  roll.  The  effects  of  bending, 
shearing  and  flattening  are  considered  through  influence  coefficients  on  the  rolls.  Though  good  agreement 
is  achieved  between  the  results  of  this  method  and  those  obtained  by  experience,  the  program  run-time  is  so 
large  that  it  must  be  considered  an  off-line  system.  Since  the  program  works  iteratively,  and  since 
calculation  of  the  influence  coefficient  matrix  for  flattening  is  time-consuming  as  (it  must  be  updated 
nearly  every  iteration),  improvement  in  program  speed  can  be  achieved  by  substituting  a  trained  neural 
network  (inputs:  distributed  loads;  outputs:  flattening  of  the  rolls),  working  as  an  equal  partner  in  the  entire 
mathematical  model. 

The  neural  network  can  be  trained  in  the  inverse  direction,  making  possible  very  fast  "inversion"  of  the 
flattening  matrix.  This  is  very  important  for  rolling  mills  operating  more  than  2  rolls,  since  calculation  of 
roll  contact  loads  requires  inversion  of  the  flattening  matrices.  This  combined  model  has  better  acceptance 
since  it  doesnt  appear  as  a  black  box,  i.e.,  a  model  based  on  neural  networks  only,  and  so  it  can  be  adapted 
for  on-line  process  control.  Two  feed-forward  neural  networks  were  designed  to  cope  with  the  problem  of 
calculating  flattening  and  loading:  one  for  load-to-flattening  and  the  second  for  inversion.  A  back- 
propagation  learning  rule  was  used.  The  training  examples  were  taken  from  each  iteration  step  for  different 
reduction  cases,  with  a  fixed  strip  width,  initially  for  a  two-high  mill  and  subsequently,  for  a  four-high  mill. 
Substantial  reduction  in  processing  time  is  obtained,  without  loss  of  precision,  since  the  flattening 
calculation  step  is  substituted  by  a  simple  sum  of  polynomials  with  an  appropriate  activation  function. 

Keywords:  Flat  rolling,  widthwise  profile,  neural  networks,  on-line  control. 

REFERENCES 

1 .  Bryant,  G.F.,  ed.,  1973..  Automation  of  tandem  mills.  The  Iron  and  Steel  Institute,  London, 

2.  Guo,  R.M.,  1990.  Development  of  a  mathematical  model  for  strip  thickness  profile,  Iron  &  Steel  Eng., 
32-39. 

3.  Ishikawa,  T.;  Tozawa,  Y.;  Nakamura,  M.  Kato,  T.,  1980.  Fundamental  study  on  the  profile  and  shape  of 
the  rolled  strip,  Proc.  Int.  Conf.  on  Steel  Rolling,  Tokyo,  772-783. 

4.  Pawelski,  O.,  Teutsch,  H.,  1985.  A  mathematical  model  for  computing  the  distribution  of  loads  and 
thickness  in  the  width  direction  of  a  strip  rolled  in  four-high  cold-rolling  mills,  Engineering  Fracture 
Mechanics,  21(4),  853-859. 

5.  Pawelski,  O.;  Rasp,  W.,  Rieckmann,  J.,  1989.  A  mathematical  model  for  predicting  the  influence  of 
elastic  and  plastic  deformations  on  strip  profile  in  six-high  cold  rolling,  4th  Inter.  Steel  Rolling  Conf., 
2(E.3.1-E.3.6). 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


562 


563 


Intelligent  Methods  in  Metal  Forming  Processes 


564 


565 


A  Fundamental  Study  of  Incremental  Deep  Drawing  Process 

Susumu  Shima,  Hidetoshi  Kotera  and  Kei  Kamitani 

Department  of  Mechanical  Engineering,  Kyoto  University, 

Sakyo-ku,  Kyoto  606-8501,  Japan 


ABSTRACT 

This  paper  deals  an  investigation  into  the  features  of  an  incremental  deep  drawing  process.  On  a  newly 
developed  incremental  deep  drawing  set-up,  aluminium  sheets  are  formed  to  circular  cups  of  various  sizes. 
Deep-drawing  is  carried  out  incrementally  with  a  set  of  tools  with  common  shapes.  Process  parameters 
studied  are  drawing  ratio,  formed  cup  size,  the  vertical  and  horizontal  displacements  of  the  punch  in  one 
step  relative  to  the  blank.  It  is  thereby  shown  that  LDR  depends  on  the  cup  size  and  that  fracture  occurs  at 
the  blank  either  near  the  punch  shoulder  or  die  shoulder  depending  on  the  forming  conditions.  A  fracture 
mode  diagram  is  thus  obtained,  where  regions  or  conditions  for  successful  deep-drawing  and  for  fracture 
occurrence  at  either  portion  of  the  blank  are  clearly  seen.  Strain  distributions  are  measured  by  a  scribed 
circle  method  with  the  aid  of  a  common  fabrication  process  for  photo  lithography. 


INTRODUCTION 

Incremental  deep-drawing  is  studied  in  an  attempt  to  develop  a  new  sheet  metal  forming  process  for  a  small 
batch  production.  In  our  previous  work  [1],  forming  was  done  with  a  manual  operation,  while  in  this  work, 
we  built  up  a  new  set-up  with  an  automatic  control  for  the  movement  of  the  punch  and  blank. 

In  recent  years,  a  metal  forming  system  with  a  large  flexibility  that  is  capable  of  dealing  with  a  small  batch 
production  with  large  varieties  has  been  demanded.  Among  other  attempts,  research  and  development  of 
incremental  forming  processes  have  been  intensively  undertaken  [2-8],  Incremental  forming  is  a  generalized 
term  of  those  forming  processes  where  tools  of  common  shapes  are  used  to  deform  a  small  portion  of  the 
workpiece  consecutively  next  to  another  to  obtain  a  desired  shape,  instead  of  particular  die-sets  that  have 
been  designed  exclusively  for  particular  shapes  of  the  products.  Although  the  time  required  for  making  one 
product  is  much  longer  than  by  ordinary  press  forming,  the  incremental  process  may  be  viable  in  view  of  the 
whole  process  including  design  and  fabrication  of  dies. 

In  previous  work  [1],  we  investigated  deformation  and  fracture  behaviour  of  the  blank  by  changing  the 
process  parameters  involved,  drawing  ratio,  increment  in  punch  displacement,  cup  diameter  to  be  formed, 
etc.  We  have  shown  as  follows: 

1 .  Incremental  deep-drawing  is  successfully  performed  below  a  deep-drawing  ratio  of  about  1.5. 

2.  The  counter  punch  supporting  the  blank  against  the  main  punch,  is  useful  to  improve  deep-drawability. 

3.  Strain  distribution  in  the  blank  and  thus,  fracture  occurrence,  depends  on  the  cup-size  to  be  deep-drawn. 

4.  Fracture  occurs  near  the  punch  shoulder  when  the  cup  is  large,  while  it  occurs  near  the  die  shoulder  when 
the  cup  is  small. 

In  this  study,  we  modify  the  previous  set-up  so  that  the  operation  is  done  with  a  sequence  controller.  We 
then  carry  out  similar  experiments  to  investigate  the  effects  of  the  above  process  parameters  on  the 
characteristics  of  incremental  deep-drawing.  We  also  measure  strain  distributions  in  the  formed  cup  by  a 
scribed  circle  method;  we  put  the  circles  by  utilizing  a  common  fabrication  process  forphoto  lithography. 


INCREMENTAL  DEEP-DRAWING 

Since  the  principles  and  details  of  the  process  are  written  elsewhere  [1],  we  will  only  describe  them  briefly. 
Unlike  conventional  deep-drawing,  we  use  a  few  tools  of  common  shapes  to  produce  various  shapes.  As 
shown  in  Figure  1,  when  we  deep-draw  a  circular  cylinder,  we  use  a  straight-shaped  die  and  a  punch  with 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


566 


blank  holder.  We  push  the  punch  against  one  portion  of  the  blank  at  a  small  displacement  with  a  blank 
holding  plate  pushing  against  the  flange  portion.  After  each  displacement  of  the  punch  toward  the  blank,  the 
punch  and  blank  holder  are  both  moved  upward  to  free  the  blank.  The  blank  is  then  rotated  by  a  small  angle 
about  the  vertical  axis  at  the  centre  of  the  blank  followed  by  punch  displacement  for  the  next  step.  The 
punch  thus  moves  in  a  spiral  manner  relative  to  the  blank.  If  n  steps  are  required  for  one  rotation  of  the 
blank,  pitch  p  is  given  by  p  =  n  d/n  and  the  lead  L  by  L  =  ns  =  7t  ds/p,  where  d  is  the  cup  diameter,  and  s  the 
punch  stroke  in  one  step.  Up  to  the  nth  step,  the  total  vertical  displacement  of  the  punch  measured  from  the 
top  surface  of  the  initial  blank  set  on  the  set-up  is  expressed  by 
SN  =s0  +(N  -\)s 

where  s0  is  the  punch  displacement  at  the  first  step.  Deep-drawing  is  thus  performed  incrementally  and 
consecutively  to  produce  a  desired  shape. 


(c)  Free  blank 


(d)  Rotate  blank 

Fig.  1.  Process  of  incremental  deep-drawing. 


In  conventional  deep-drawing,  LDR  (Limit  drawing  ratio)  is  a  common  measure  for  formability  or  deep- 
drawability.  Similarly  in  incremental  deep-drawing,  we  evaluate  formability  by  LDR.  As  will  be  shown  in  a 
later  section,  LDR,  as  well  as  fracture  occurrence,  are  influenced  by  the  size  of  the  formed  cup.  We 
investigate  the  process  by  introducing  these  important  measures  in  addition  to  DR  (  drawing  ratio=D/j ): 

a)  Specific  cup  height:  h/d. 

b)  Flange  shrinking  ratio:  D/D 

where  D  is  the  initial  blank  diameter,  Df  is  the  flange  diameter  during  forming,  d  is  the  inner  diameter  of  the 
formed  cup  and  h  is  its  height.  We  evaluate  the  specific  cup  height  and  flange  shrinking  ratio  for  cases 
where  the  cups  being  formed  undergo  fracture. 


EXPERIMENTAL  SET-UP  AND  PROCESS 

The  experimental  set-up  developed  for  the  present  study  is  shown  schematically  in  Figure  2.  The  main 
punch  is  displaced  by  a  ball  screw  jack  that  is  actuated  by  a  stepping  motor.  An  air  cylinder  actuates  the 
blank  holder.  The  blank  is  rotated  about  its  center  by  three  pairs  of  rollers  at  a  prescribed  angle  as  depicted 
in  Figure  3.  These  steps  are  repeated  continually  one  after  another,  using  a  sequence  controller.  Deep- 
drawing  is  thus,  automatically  performed  incrementally.  For  the  present  process,  the  diameter  of  the  punch 


567 


is  50mm  with  a  shoulder  radius  of  5mm.  We,  therefore,  deep-draw  cylindrical  cups  with  diameters  above 
50mm. 


Circular  blanks  of  commercially-pure  aluminium,  A1050-O,  of  ~140mm  diameter  with  1.0mm  thickness 
were  used.  Since  we  had  previously  confirmed  that  a  counter  punch  was  unnecessary  for  successful  deep 
drawing,  we  did  not  use  it  in  the  present  set-up.  The  blank  holding  force  was  430N  and  was  kept  constant 
throughout  the  experiment.  Molybdenum-disulfide  was  used  as  a  lubricant  on  both  sides  of  the  blank. 


ball  screw  jack 

stepping  motor 


air  cluck 


punch 


air  cylinder. 


blank 


3^- 


die  set 


Q 


die 


counter  |TJ~[j 
punch  1JI 


r 


]  blank  holds: 


outer  frame 


Fig.2.  Schematic  view  of  incremental  deep-drawing  set-up.  Fig.3.  Arrangement  of  three  pairs  of  rollers. 

EXPERIMENTAL  RESULTS  AND  DISCUSSION 

Effect  of  punch  stroke,  pitch  and  lead  on  forming  process 

In  incremental  deep-drawing,  there  are  characteristic  parameters  such  as  punch  strokes,  pitch  p  and  lead  L. 
The  effects  of  these  parameters  on  the  specific  cup  height  h/d  and  flange  shrinking  ratio  Df/D  were  first 
examined  with  other  conditions  being  fixed;  the  blank  diameter  in  this  case  was  138mm.  While  drawing 
ratio  DR  was  chosen  between  1.408  and  1.438,  the  punch  stroke  in  one  step  s  was  changed  from  10.6  to 
47.6pm;  pitch p  from  3.77  to  9.23pm;  L  was  changed  from  0.525  to  2.00mm.  The  results  were  as  follows: 

1.  For  DR=1. 422- 1.438,  and  s  =  10.6-15.9mm  with  p  =  3.77-9.23mm  and  L  =  0.525-0.848mm,  both 
specific  cup  height  h/d  and  flange  shrinking  ratio  Df/D  are  almost  constant;  h/d  =  0.124  and  Df/D  = 
0.949. 

2.  When  DR  is  smaller  (DR=1.408),  the  specific  cup  height  h/d  increases  to  0.152  and  Df/D  decreases  to 
0.812,  while  the  other  parameters  above  are  more  or  less  the  same. 

3.  When  the  punch  stroke  in  one  step  5  becomes  larger  at  the  same  drawing  ratio  DR  as  in  1)  above,  deep- 
drawing  is  possible,  but  the  surface  of  the  cup  becomes  bumpy. 

If  DR  is  much  larger  than  LDR,  the  specific  height  h/d  is  almost  zero  and  Df/D  is  almost  unity.  If  DR 
decreases  to  approach  LDR,  then  the  former  increases,  while  the  latter  decreases.  From  the  results  above, 
h/d  and  Df/D,  and  hence,  the  deep-drawing  process,  are  not  influenced  by  the  parameters  s,  p  and  L  for 
DR  =  LDR ,  although  forming  with  larger  s  or  L  may  provide  products  with  bumpy  surfaces.  So  in  the 
following,  we  chose  the  lead  L  to  be  as  large  as  the  thickness  of  the  blank. 

Fracture  mode  diagram 

As  observed  in  our  previous  work  [  1  ],  the  LDR  in  incremental  deep-drawing  depends  on  the  diameter  of  the 
formed  cup.  To  examine  this  in  more  detail,  we  carried  out  experiments  by  changing  the  blank  diameter  D 
from  96  to  143mm,  and  the  diameter  of  the  formed  cup  d  from  58  to  109.5mm.  For  conventional  deep- 
drawing,  a  size  effect  on  LDR  is  observed,  but  for  the  above  conditions,  LDR  is  more  or  less  constant  [9], 
Figure  4  shows  the  experimentally  derived  drawing  ratio  plotted  against  cup  diameter.  In  the  diagram,  J 


568 


refers  to  the  case  of  successful  deep-drawing  and  I  and  H  refer  to  fracture  occurrence  at  the  cup  shoulder 
and  near  the  die  shoulder,  respectively.  This  figure  demonstrates  that  the  forming  condition  can  be  divided 
into  three  regions:  the  first  wherein  successful  deep-drawing  is  achieved;  the  second  where  fracture  occurs 
in  the  blank  near  the  punch  shoulder;  and  the  third  where  fracture  occurs  in  the  blank  near  the  die  shoulder. 
The  first  condition  refers  to  the  region  below  LDR.  In  the  region  above  LDR,  fracture  occurs  in  the  blank 
near  the  cup  shoulder  for  larger  cup  diameters,  while  it  occurs  near  the  die  shoulder  for  smaller  ones.  The 
boundary  line  between  the  condition  for  fracture-at-the-cup  and  that  for  fracture-at-the-diemay  be  called  the 
parting  line  for  fracture  mode.  The  diagram  shows  the  main  features  of  incremental  deep-drawing  and  can 
be  called  the  fracture  mode  diagram.  This  is  discussed  in  the  next  section. 


Fig.4.  Fracture  mode  diagram.  Fig.5.  Contact  areas  between  die,  blank  and  punch. 

Figure  5  shows  an  illustration  of  contact  area  between  the  die,  blank  and  punch.  The  contact  area  at  the  die 
shoulder  (area  A  in  Fig.  5)  is  larger  if  the  formed  cup  diameter  is  larger,  while  the  contact  area  at  the  punch 
shoulder  (area  B)  becomes  larger  as  d  approaches  the  constant  punch  diameter.  The  loads  supported  by 
these  two  contact  areas  are  almost  equal  to  each  other  during  deep-drawing  because  the  friction  at  the 
punch/cup  interface  is  negligibly  small.  If  there  is  a  difference  between  the  two  areas,  the  blank  portion  may 
undergo  fracture  at  either  of  the  smaller  contact  areas  that  result  in  a  larger  stress  in  the  blank.  This  leads  to 
the  conclusion  that  at  the  condition  where  the  two  contact  areas  are  equal  to  each  other,  there  is  a  maximum 
forming  limit.  Such  a  cup  may  be  called  the  optimum  cup  diameter  for  a  particular  punch  diameter.  Figure  4 
demonstrates  the  phenomena  and  shows  an  optimum  cup  diameter  for  the  present  condition  of  ~65mm.  As 
seen  above,  we  must  pay  attention  to  the  cup  diameter  to  evaluate  formability  in  incremental  deep-drawing. 
To  increase  formability,  the  formed  cup  must  be  chosen  so  that  it  is  near  the  optimum  diameter. 

Strain  measurement  by  scribed  circle  method 

While  there  are  various  ways  to  scribe  circles  on  the  blank  surface,  we  utilized  the  photo-resist  method 
commonly  used  to  fabricate  semi-conductors.  The  circles  were  1mm  in  diameter  and  the  process  used  was: 

1 .  Coat  negative-resist  liquid  on  the  blank  surface  on  a  spin  coater  so  that  a  thin  uniform  film  is  obtained. 

2.  Pre-bake  the  blank  so  that  the  coated  liquid  becomes  solid. 

3.  Put  a  pattern  mask  that  gives  scribed  circle  patterns  on  the  coated  surface  followed  by  exposure. 

4.  Remove  the  unnecessary  resist  mask  used  to  develop  the  scribed  circles  on  the  blank  surface. 

5.  Post-bake  to  fix  the  circles. 

Figure  6  shows  measured  distribution  of  radial  strain,  circumferential  strain  and  thickness  strain.  The 
forming  condition  shown  in  the  figure  refers  to  the  case  where  fracture  occurred  at  the  flange  near  the  die 
shoulder.  The  diagram  shows  that  strain  is  concentrated  near  the  cup  shoulder  and  the  blank  near  the  die 
shoulder.  Although  not  shown  here,  the  strain  was  concentrated  more  significantly  at  the  cup  shoulder  for 


logarithmic  strain 


dp  [mm]  I  rd  [mm] 


D  [mm] 


102.5 


t  [mm] 


1.0 


Df[mm]  d  [mm] 


h  [mm] 


18.8 


dp  [mm] 

rd  [mm] 

D  [mm] 

t  [mm] 

Df[mm] 

d  [mm] 

h  [mm] 

DR 

(a) 

50 

5 

112.0 

1.0 

106.0 

77.5 

16.0 

1.445 

(b) 

50 

5 

96.0 

1.0 

84.0 

64.0 

18.9 

1.50 

Fig. 7.  Change  in  punch  forces  with  height  of  deep-drawn  cup. 


570 


Forming  Load 

We  measured  the  forming  load  by  affixing  strain  gauges  on  the  punch.  Figure  7  shows  the  change  in  punch 
force  with  formed  cup  height.  The  conditions  shown  below  the  diagram;  a)  refers  to  the  condition  where 
fracture  occurred  at  the  cup  shoulder,  while  b)  refers  to  the  case  where  no  fracture  occurred.  In  the  present 
process,  forming  is  done  intermittently  and  the  actually  recorded  curves  are  a  wave  with  the  cycle  time 
required  for  each  forming  step;  only  the  peak  load  in  the  wave  is  plotted  in  the  graph.  The  punch  force 
increases  with  the  progress  of  incremental  deep-drawing,  reaches  its  maximum  and  then  decreases  at  the 
end  of  forming.  The  peak  load  is  obviously  very  small  but,  in  case  a)  is  larger  than  that  in  case  b),  although 
the  drawing  ratio  is  smaller.  This  results  in  the  occurrence  of  fracture  at  a  lower  drawing  ratio.  The  decrease 
in  the  punch  force  after  peak  load  in  condition  a)  refers  to  the  occurrence  of  necking  in  the  blank  at  the  cup 
shoulder,  while  that  in  b)  refers  to  the  decrease  in  flange  width  according  to  the  progress  of  deep-drawing. 


CONCLUSIONS 

An  incremental  deep-drawing  process  was  performed  on  commercially-pure  aluminium  by  using  die  and 

punch  of  common  shapes.  We  were  thus  able  to  understand  the  features  of  the  present  process.  The  results 

are  as  follows. 

1)  Incremental  deep-drawing  is  successfully  performed  depending  on  the  forming  condition.  Within  the 
present  experiments,  the  limit  of  deep-drawing  ratio  is  about  1.5  to  1.6. 

2)  In  evaluating  forming,  unlike  conventional  deep-drawing,  we  must  pay  attention  to  the  size  of  the  formed 
cup  in  relation  to  the  punch  size. 

3)  A  forming  limit  diagram  shown  on  the  drawing  ratio-cup  size  diameter  plane  is  thus  obtained.  The  zone 
in  the  diagram  is  divided  into  three:  the  first  where  incremental  deep-drawing  provides  sound  products; 
the  second  where  fracture  occurs  at  the  blank  near  the  cup  shoulder;  the  third  where  fracture  takes  place 
near  the  die  shoulder.  This  can  be  called  the  fracture  mode  diagram. 

4)  There  is  an  optimum  that  provides  the  highest  drawing  ratio  near  the  boundary  between  these  two 
fracture  conditions. 

5)  Employing  a  common  fabrication  process  for  semi-conductors,  we  scribed  circles  on  the  blank  surface 
and  measured  strain  distribution  after  forming.  In  a  successfully  formed  cup,  while  strains  are  larger  in 
the  blank  near  both  the  cup  and  die  shoulders,  the  flange  undergoes  sufficient  strain  to  be  deep-drawn. 

6)  The  punch  force  is  obviously  very  small.  It  increases  with  increasing  punch  stroke  and  then  decreases. 
The  peak  force  for  the  case  where  fracture  occurs  is  larger  than  in  the  case  of  successful  drawing,  even  if 
the  drawing  ratio  is  lower. 


REFERENCES 

1.  S.  Shima,  H.  Kotera,  Kamitani,  K.  and  T.  Bando,  1998.  Development  of  Incremental  Deep  Drawing 
Process.  Metals  &  Materials,  4(3),  404-407. 

2.  S.  Matsubara,  1994.  CNC  Incremental  Forming.  J.  Japan  Soc.  Tech.  Plasticity,  35(406),  1258-1263. 

3.  FI.  Iseki,  K.  Kato  and  S.  Sakamoto,  1992.  Flexible  and  Incremental  Sheet  Metal  Bulging  using  a  Path- 
Controlled  Spherical  Roller.  Tr.  JSME  (C),  58(554),  3147-3152. 

4.  K.  Kitazawa,  A.  Wakabayashi,  K.  Murata  and  J.  Seino,  1994.  A  CNC  Incremental  Sheet  Metal 
Forming  Method  Producing  the  Shell  Components  Having  Sharp  Comers.  J.  Japan  Soc.  Tech. 
Plasticity,  35(406),  1348-1353. 

5.  T.  Hasebe  and  S.  Shima,  1994.  A  Study  of  Flexible  Forming  by  Hammering.  J.  Japan  Soc.  Tech. 
Plasticity,  35(406),  1323-1329. 

6.  T.  Hasebe,  S.  Shima  and  Y.  Imaida,  1996.  A  Study  of  Flexible  Forming  by  Progressive  Hammering. 
Adv.  Tech.  Plasticity,  Proc.  5th  Int.  Conf.  Tech.  Plasticity,  Vol.II,  951-954. 

7.  S.  Shima,  H.  Kotera  and  H.  Murakami,  1997.  Development  of  Flexible  Spin-Forming  Method.  J.  Japan 
Soc.  Tech.  Plasticity,  38(440),  814-818. 

8.  S.  Shima,  H.  Kotera,  H.  Murakami  and  N.  Nakamura,  1996.  Development  of  Flexible  Spinning-A 
Fundamental  Study-.  Adv.  Tech.  Plasticity,  Proc.  5th  Int.  Conf.  Tech.  Plasticity,  Vol.II,,  557-560. 

9.  M.  Nakano,  Y.  Ueno  and  S.  Kanehara,  1972.  Size  and  Shape  effect  on  Sheet  Metal  Forming.  J.  Japan 
Soc.  Tech.  Plasticity,  13(141),  745-750. 


571 


Intelligent  Design  Architecture  for  Process  Control  of  Deep-Drawing 

K.  Manabe*,  H.  Koyama*,  K.  Katoh**  and  S.  Yoshihara*** 

*Tokyo  Metropolitan  University  Department  of  Mechanical  Engineering, 

Hachioji-shi,  Tokyo  192-0397,  Japan 

** Integrated  Systems  Japan,  Ltd.  Asahi  Bank  Gotanda  Bldg.  1-23-9  Nishi-gotanda, 
Shinagawa-ku,  Tokyo  141-0031,  Japan 
***Tokyo  National  College  of  Technology  Dept.  Mechanical  Engineering, 
Hachioji-shi,  Tokyo  193-8610,  Japan 

ABSTRACT 

A  concept  of  design  architecture  with  a  database  for  an  intelligent  sheet  metal  forming  system  was 
proposed  to  enable  designing  of  a  process  control  system  without  experts  who  are  skilled  and  experienced 
in  the  forming  process.  In  this  study,  the  proposed  architecture  was  applied  to  the  variable  blank  holding 
force  (BHF)  control  technique  for  circular-cup  deep-drawing.  The  system  is  available  for  three  objective 
functions  which  are  typical  process  requirements,  cup  wall  uniformity,  cup  height  improvement  and  energy 
saving.  The  availability  of  this  design  architecture  is  confirmed  by  experiments  on  aluminum  alloy  sheets. 

INTRODUCTION 

Several  studies  on  the  optimization  of  process  control  in  metal  forming  have  been  performed  in  an 
approach  toward  intellectualization.  For  sheet  stamping  operations,  intelligent  deep-drawing  techniques 
have  been  developed  to  date.  One  technique  is  an  adaptive  control  method  by  means  of  blank  holding  force 
(BHF)  with  fuzzy  inference  for  circular-cup  deep-drawing[l].  Another  one  is  a  control  approach  based  on  a 
plastic  deformation  model  involving  the  material  and  friction  identification  process  with  an  artificial  neural 
network  (ANN)[2).  Despite  their  excellent  advantages,  each  control  system  requires  very  extensive  time 
and  labor  for  the  design  and  development  process,  and  above  all,  the  design  engineer  has  to  be  a  knowledge 
expert  as  well  as  skilled  and  experienced  engineer  on  the  forming  process  techniques,  or  else,  the  assistance 
of  a  craftsman  would  be  essential.  Therefore,  it  is  necessary  to  establish  a  new  concept  for  process  design 
architecture  which  obviates  the  requirement  for  an  expert.  In  general,  the  forming  cell  and  system  must  be 
efficiently  designed  during  process  design  and  process  control.  In  the  former  system,  as  shown  in  Fig.  1 
(left),  the  expert  plays  a  number  of  roles  as  the  core. 


Fig.  1 A  new  intelligent  approach  for  various  design  phases  on  metal  forming  processes 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


572 


He  also  acquires  the  required  experience  and  transfers  the  knowledge  to  inexperienced  engineers.  The 
number  of  experts  has  been  gradually  decreasing  over  the  years.  Thus,  in  the  future  system,  the  process 
conditions  must  be  automatically  optimized  without  the  aid  of  an  expert.  The  purpose  of  this  study  is  to 
enable  freedom  from  the  dependency  on  engineering  experts  in  the  design  phases  of  process  planning  and 
control  design  and  to  develop  an  intelligent  design  architecture  for  deep-drawing  process  control  without 
the  aid  of  a  knowledge  expert. 


OUTLINE  OF  A  NEW  INTELLIGENT  PROCESS  DESIGN  AND 
ITS  SYSTEM  ARCHITECTURE 

Our  concept  shown  in  Fig.  1  (right)  involves  the  replacement  of  the  brain  functions  of  an  expert  by  a 
processor  which  contains  an  analyzer,  database  and  knowledge  base.  The  processor  can  design  and  control 
the  process  according  to  a  suitable  set  of  rules  and  algorithm  from  the  database  and  knowledge  base,  and 
stores  the  sensing  information  from  a  forming  cell  during  the  process,  which  is  similar  to  the  experience  of 
an  expert.  In  other  words,  it  can  grow  by  acquiring  experience  in  the  same  manner  as  an  expert.  Hence,  the 
proposed  system  is  able  to  automatically  optimize  the  process  and  can  be  operated  without  any  aid  from  an 
expert. 

Figure  2  shows  the  outline  of  the  system  architecture  based  on  the  above  concept.  It  can  be  broadly  divided 
into  two  parts.  One  is  a  processor  and  another  one  is  a  forming  cell  and  system.  The  forming  cell  has 
several  sensors  for  supplying  process  information  to  the  processor,  and  also  has  actuators  to  implement  the 
commands  from  the  processor.  The  processor  consists  of  a  database,  knowledge  base  and  an  analyzer 
(commercial  control  design  support  tool;  MatrixX).  The  database  and  knowledge  base  contain  the  process 
information  under  various  conditions  and  the  methodology  for  designing  the  process,  respectively.  The 
processor  is  capable  of  not  only  designing  the  process  using  the  database  and  knowledge  base  but  also 
identifying  the  material  properties  of  the  workpiece  and  control  actuator  using  sensing  information  from  the 
sensors.  In  addition,  the  system  can  handle  a  variety  of  workpieces  as  well  as  the  change  of  workpiece 
material,  tooling  conditions  and  lubricating  conditions,  by  utilizing  the  database  and  knowledge  base. 

APPLICATION  OF  THE  ARCHITECTURE  TO  DEEP-DRAWING  PROCESS 

In  this  study,  the  circular-cup  deep-drawing  problem  is  adopted  as  a  fundamental  and  important  example  of 
the  sheet  metal  forming  process.  In  the  deep-drawing  process,  the  forming  limit  is  mainly  governed  by  the 
fracture  at  the  punch  shoulder  and  the  wrinkle  at  the  flange  part.  Although  it  is  essential  to  apply  the  BHF 
to  avoid  wrinkles,  excessive  BHF  causes  fractures.  Therefore,  the  appropriate  amount  of  BHF  is  required  to 
carry  out  the  process  successfully.  So  the  new  design  architecture  is  applied  for  the  adaptive  control  of  BHF 
in  the  deep-drawing  process  in  order  to  verify  the  availability  of  the  architecture.  Figure  3  shows  the  design 
system  architecture  for  an  intelligent  metal  forming  process  with  a  database.  In  the  system,  fuzzy  inference 
was  chosen  as  an  AI  tool  for  process  control  design. 


Processor  Forming  cell  and  system 


Process  design  Process  control 

Fig.  2  A  concept  of  intelligent  metal  forming  cell  with  database 


573 


Fig.  3  System  architecture  of  intelligent  forming  process  for  deep-drawing  process 


The  evaluation  functions  should  not  be  influenced  by  the  blank  material,  tooling  conditions,  environmental 
conditions  and  other  factors.  For  this  reason,  evaluation  functions  <]>  and  c>',  obtained  from  the  punch  stroke 
curve  and  \|/  and  from  maximum  apparent  blank  thickness  curve  are  used.  The  evaluation  function  tf>  is 
the  difference  between  the  actual  punch  stroke  curve  and  the  ideal  curve,  which  can  be  obtained 
geometrically  by  assuming  uniform  wall  thickness.  <)>'  is  the  differential  coefficient  of  <]>  by  blank  reduction 
ratio  ADR*.  A  combination  of  <|>  and  <b'  is  used  for  fracture  estimation.  The  evaluation  function  \|/  is  the 
blank  holder  displacement  which  is  equal  to  blank  thickness  at  the  flange  edge  and  is  used  instead  of  the 
wall  thickness  distribution.  In  the  same  manner  as  (b,  a  combination  of  \|/  and  \\i'  is  used  to  evaluate  wrinkle 
behavior.  A  constraint  function  %  is  defined  as  the  differential  coefficient  of  the  punch  load  curve  to 
evaluate  the  progress  of  the  process. 

The  database  in  this  study  is  composed  of  four  kinds  of  process  variables,  punch  stroke,  punch  load, 
maximum  apparent  thickness,  and  ADR*.  The  blank  reduction  ratio  ADR*  can  be  obtained  from  the 
displacement  of  the  flange  edge  and  is  given  by 

~DR*  =  — 

Ro 

where  Ro  is  the  initial  blank  radius  and  s  is  the  displacement  of  the  flange  edge.These  process  data  are 
utilized  to  design  the  sets  of  appropriate  membership  functions  of  the  evaluation  functions  so  that  they  have 
to  be  accumulated  under  various  material  and  process  conditions  (material  properties,  tooling  condition, 
lubrication  condition,  ambient  condition  among  others). 

In  this  proposed  architecture,  three  objective  functions  can  be  designed.  The  first  is  the  improvement  of  the 
cup  height,  which  can  be  achieved  by  applying  the  maximum  BHF  below  the  fracture  limit.  The  second  is 
process  energy  savings  by  implementation  of  the  minimum  BHF  beyond  the  wrinkle  limit.  The  third  is  for 
the  wall  thickness  uniformity,  whose  control  scheme  can  be  achieved  by  a  combination  of  the  above  two 
objective  functions. 


574 


Progress  of  proces 


Fig.  4  Requirement  of  process  information 
contained  in  the  database 


Fig.  5  Sets  of  input  membership  functions 


Table  1  If-then  rule  of  <]),  <j)’  and  ABHF 


m,  v) 

Then(ABHF) 

<()  is  Small  and  (j)’  is  Small 

ABHF=  ABHFss 

<j)  is  Small  and  <ti’  is  Large 

ABHF=ABHFSL 

<|>  is  Large  and  (])’  is  Small 

ABHF=  ABHFls 

(j>  is  Large  and  <(>’  is  Large 

ABHF=  AB1IFll 

A  BHF  /  kN 

Fig.  6  A  set  of  output  membership  functions 


FUZZY  MODEL 

Application  of  the  fuzzy  model  provides  a  suitable  and  easy  way  to  optimize  process  control  because  the 
deep-drawing  process  is  not  only  unsteady  and  complicated  but  also  has  nonlinear  forming  characteristics. 

The  sets  of  membership  functions  used  for  the  antecedent  of  the  If-then  rules  are  designed  through  the 
database.  The  database  must  contain  at  least  two  typical  conditions  of  the  constant  BHF.  One  is  a  high  BHF 
condition  which  causes  fracture  and  another  one  is  a  low  BHF  condition  which  leads  to  wrinkling  as  shown 
in  Fig.  4.  Two  membership  functions  related  to  <j>  are  built  from  the  process  data  as  mentioned  in  the 
previous  section.  The  latter  data  create  two  membership  functions  in  relation  to  t) /.  In  the  present  study, 
only  two  sets  of  membership  functions  concerned  with  <j>  and  <j)'  were  employed  because  the  objective 
function  is  the  improvement  of  cup  height  as  described  above.  Two  maximum  values  <|>b  and  4>’b  in  Fig.  5 
should  correspond  to  the  state  of  fracture.  Hence  each  value  was  decided  on  the  basis  of  the  maximum 
value  retrieved  from  the  database  of  fracture  limit  conditions.  Meanwhile,  (])a  and  (j>'a  were  decided  by 
substituting  the  minimum  value  stored  in  the  database  in  similar  to  (|>  and  ()>'. 

Figure  6  shows  the  set  of  membership  functions  used  for  the  consequent  of  If-then  rule.  This  part  was 
decided  with  the  assistance  of  an  expert  with  experience  resulting  from  trial  and  error  in  the  previous  work 
[4],  However,  the  use  of  this  new  simplified  set  of  membership  functions  does  not  require  any  experience 
so  that  the  designer  and  machine  operator  need  not  be  skilled  and  experienced.  The  initial  range  of  each 
membership  function  in  Fig.6  can  be  automatically  designed  via  this  system.  They  only  have  to  provide  the 
multiplier  to  the  value  of  the  system  output  (ABHF), whose  value  was  0.2,  due  to  the  dependency  on  the 
forming  cell  used.  Table  1  shows  the  If-then  rules  for  BHF  control. 

Figure  7  shows  the  fuzzy  inference  for  ABHF  used  in  this  study.  Although  the  max-min-rule  is  the  most 
common  inference  rule,  larger  membership  functions  are  omitted,  when  the  min-operator  is  used.  However, 
it  is  desirable  that  both  membership  functions  be  considered.  Therefore,  in  this  work,  the  areas  of  the 
membership  functions  are  used  desspite  of  the  use  of  the  min-operator  as  shown  in  Fig.  7.  Fuzzy  outputs  of 
individual  fuzzy  rules  are  combined  using  the  max-operator  and  the  centroid  of  the  area  is  the  output. 


575 


Fig.  7  Fuzzy  inference  model  for  ABHF 
Table  2  Material  properties  of  blank  used 


Yield  Stress 
<js  /  Nmm'2 

Tensile  Strength 
oB  /  Nmm-2 

F  value 
/Nmm’2 

Elongation 

/% 

N  value 

R  value 

117 

264 

398 

30.1 

0.28 

0.6 

Table  3  Experimental  conditions _  _ Table  4  Tooling  conditions 


Punch  Speed 

5  mm/min  constant 

Punch  shoulder  radius  rp  /  mm 

4 

BHF 

0.5-50  kN  variable 

Punch  diameter  Dp  /  mm 

33 

Lubrication 

Lubricating  oil  (218  mmV) 

Die  shoulder  radius  rd  /  mm 

3 

DR 

1.98 

Die  diameter  Dd  /  mm 

36.5 

EXPERIMENT 

Material  Used  and  Experimental  Conditions 

Aluminum  alloy  sheet  metal  (A5 182-0)  of  thickness  1.0mm  was  used  in  the  deep-drawing  experiment.  The 
material  properties  are  listed  in  Table  2.  The  deep-drawing  system  used  is  capable  of  computerized  control 
of  BHF  and  the  punch  speed  during  the  process  [3].  The  system  has  several  sensors:  punch  stroke,  punch 
load,  BHF,  radial  drawing  displacement  of  the  blank  flange  which  was  sensed  by  a  displacement  transducer 
and  blank  holder  displacement  by  an  eddy  current  displacement  transducer.  Tables  3  and  4  show  the 
experimental  conditions  and  tooling  conditions,  respectively. 


ADR* 


BHF  Type 


Fig.  8  Punch  load  and  controlled  BHF  curves  during  the  process 


Fig.  9  Comparison  of  cup  height 
between  const,  and  variable 
BHF  conditions. 


576 


Experimental  Procedure 

CONSTANT  BHF  DEEP-DRAWING  TEST 

The  first  step  in  the  design  process  is  to  construct  the  database  from  the  results  of  the  constant  BHF  test.  In 
the  present  architecture,  the  process  information  for  the  fracture  and  wrinkle  limit  BHF  conditions  are 
essential.  However,  since  this  study  deals  with  the  improvement  of  the  cup  height  as  the  objective  function 
to  verify  the  effectiveness  of  the  design  system  architecture,  process  information  related  to  fracture  limit 
BHF  condition  was  collected  and  stored  in  the  database. 

FUZZY  CONTROLLED  VARIABLE  BHF  DEEP-DRAWING  TEST 
The  variable  BHF  deep-drawing  test  with  fuzzy  control  was  conducted  on  the  basis  of  the  above  objective 
function.  The  details  of  the  procedure  are  as  follows.  First,  membership  functions  are  produced  by  uang  a 
database  constructed  from  the  constant  BHF  test.  Second,  initial  BHF,  blank  geometry  and  punch  speed  are 
input  into  the  processor  and  then  the  die  descends  at  a  constant  speed.  The  BHF  is  automatically  controlled 
in  a  closed  loop  to  satisfy  the  objective  function  by  the  obtained  fuzzy  rule.  In  this  study  the  initial  BHF  is 
set  to  l.OkN.  For  the  objective  function  of  the  highest  cup,  the  processor  basically  controls  BHF  to  increase 
it  to  the  maximum  possible  value  to  obtain  the  highest  drawn  cup.  When  the  evaluation  function  indicates  a 
high  possibility  of  fracture,  then  the  BHF  can  be  controlled  to  decrease  it  in  order  to  avoid  fracture.  On  the 
contrary,  when  the  evaluation  function  shows  enough  allowance  to  fracture,  then  the  BHF  can  be  increased. 


RESULTS  AND  DISCUSSION 

Figure  8  shows  the  experimental  curves  for  punch  load  and  BHF  which  are  obtained  by  a  variable  BHF 
control  system  designed  by  the  new  system  design  architecture  with  a  database.  The  fracture  limit  BHF 
curve  obtained  from  the  plastic  deformation  model  [2]  is  also  indicated.  Variable  BHF  path  indicates  the 
increase  of  BHF  as  high  as  possible  and  avoidance  of  the  fracture  limit  according  to  the  objective  function. 
As  a  result,  an  improved  drawn  cup  height  was  accomplished  as  shown  in  Figure  9.  However  the 
experimental  results  are  still  insufficient.  At  the  next  stage,  is  is  necessary  to  feed-back  the  process 
information  under  variable  BHF  conditions  shown  in  Fig.  8,  to  optimize  the  fuzzy  rule.  Such  a  routine  will 
enable  optimization  of  the  process  control. 


CONCLUSIONS 

1.  A  new  concept  of  an  intelligent  design  system  with  a  database  replacing  a  knowledge  expert,  is 
proposed  for  intelligent  sheet  metal  forming.  A  system  architecture  based  on  the  proposed  concept  is 
developed  and  applied  to  circular-cup  deep-drawing  process. 

2.  The  validity  of  the  proposed  concept  and  system  is  confirmed  through  the  implementation  of  the 
system  architecture  for  the  fuzzy  control  variable  BHF  deep-drawing  process. 


REFERENCES 

1.  K.Manabe,  S.Yoshihara,  M.Yang,  and  H.Nishimura,  1995.  Fuzzy  Controlled  Variable  BHF  Technique 
for  Circular-Cup  Deep  Drawing  of  Aluminum  Alloy  Sheet.  Technical  Papers  NAMRI/SME,  41-46. 

2.  K.Manabe,  M.Yang,  and  S.Yoshihara,  1998.  Artificial  Intelligence  Identification  of  Process  Parameters 
and  Adaptive  Control  System  for  Deep-Drawing  Process.  J.  Mater.  Process  Technol.,  80-81,  421-426. 

3.  K.  Soeda,  K.  Manabe  and  H.  Nishimura,  1993.  Process  control  of  Punch  Speed  and  Blank  Holding 
Force  on  Cylindrical  Cup  Warm  Deep  Drawing  of  Aluminum  Alloy  Sheets.  Proc.  1993  Japanese  Spring 
Conf.  Tech.  Plasticity  (in  Japanese),  65-68. 

4.  S.  Yoshihara,  K.  Manabe,  M.  Yang,  H.  Nishimura,  1997.  Fuzzy  Adaptive  Control  of  Circular-Cup  Deep- 
Drawing  Process  Using  Variable  Blank  Holder  Force  Technique.  J.  JSTP  435  (in  Japanese),  46-51 


577 


An  Iterative  Approach  to  Determine  Composition  and  Heat 
Treatment  from  the  Mechanical  Yield  Strength 
of  an  Aluminum-Lithium  Alloy 

J.  M.  Fragomeni 

Department  of  Mechanical  Engineering,  Stocker  Engineering  Center, 

Ohio  University,  Athens,  Ohio,  USA 


ABSTRACT 

The  development  of  a  model  to  predict  an  alloy’s  microstructure  and  processing  variables  from  specific 
mechanical  properties  or  desired  mechanical  properties  was  the  general  emphasis  of  this  investigation.  The 
processing  variables  included  the  alloyk  overall  heat  treatment,  which  involves  the  aging  practice  (time 
and  temperature)  and  the  solution  heat  treatment  practice,  and  also  the  manufacturing  processing  of  the 
alloy  which  involved  direct  extrusion  processing.  The  particular  mechanical  property  of  interest  for  the 
aluminum-lithium  demonstration  alloy  was  the  mechanical  tensile  strength.  The  microstructure  was  used  as 
the  basis  for  determining  both  the  composition  and  the  heat  treatment  processing  requirements  for 
obtaining  the  desired  mechanical  property.  Specifically,  a  materials  design  model  was  designed  to 
determine  microstructural  parameters  from  mechanical  properties  as  the  basis  for  prediction  and/or 
specification  of  the  heat  treatment  processing  parameters.  An  iterative  approach  was  taken  to  improve  the 
initial  determination  of  thermal  processing  and  composition.  The  overall  approach  will  design  a 
precipitation  hardened  alloy  fc  heat  treatment  and  composition  to  satisfy  the  design  tensile  strength  and 
microstructure  requirements  of  a  given  materials  design  and  manufacturing  program. 


BACKGROUND  AND  INTRODUCTION 

The  strength  of  an  alloy  can  considerably  vary  when  going  from  one  heat-treatment  to  another  i.e.,  say 
from  the  underaged  to  peak-aged  condition  or  from  the  peak-aged  to  the  overaged  condition.  Thus,  the 
strength  of  an  alloy  is  a  function  of  both  aging  time  and  aging  temperature  as  well  as  composition.  The 
strength  of  a  precipitation  strengthened  alloy  is  substantially  determined  from  the  strengthening  precipitates 
in  the  microstructure  which  impede  dislocation  motion  due  to  plastic  deformation.  Plastic  deformation  of  a 
metal  or  alloy  results  predominantly  by  the  motion  and  generation  of  dislocations.  The  particles  in  the 
microstructure  which  act  as  obstacles  to  dislocation  motion  are  a  result  of  precipitation  strengthening  or 
age  hardening  of  an  alloy  by  solution  treating  and  quenching  the  alloy  in  which  a  second  phase  in  solid 
solution  at  elevated  temperature  precipitates  out  upon  quenching  and  aging  at  lower  temperature.  Both 
strength  and  hardness  of  an  alloy  increases  with  increasing  aging  time  or  increasing  aging  temperature  after 
rapid  cooling  from  the  solution  heat  treatment.  For  this  age  hardening  or  precipitation  strengthening  to  be 
possible,  the  second  phase  must  be  soluble  in  solid  solution  at  an  elevated  temperature  and  must  exhibit 
decreasing  solid  solubility  with  decreasing  temperature. 

A  considerable  amount  of  energy  is  required  for  dislocations  to  propagate  through  an  array  of  precipitate 
particles,  either  through  dislocation  particle  shearing  or  dislocation  particle  looping.  The  elastic  strain 
energies  of  dislocations  as  well  as  the  dislocation  particle  interaction  mechanisms  are  important 
considerations.  In  the  underaged  heat  treatment  of  an  alloy,  the  precipitates  are  relatively  small  in  size  and 
therefore  easily  sheared  by  dislocations  moving  under  an  applied  stress.  If  an  alloy  is  further  precipitation 
hardened  with  an  increased  aging  time,  the  precipitate  particles  will  grow  in  size  and  provide  greater 
resistance  to  the  motion  of  dislocations.  As  a  result  there  is  an  observed  increase  in  the  strength  of  the 
alloy.  The  precipitate  particle  will  continue  to  increase  in  size  with  aging  time  until  a  critical  particle  size  is 
reached  where  the  force  of  interaction  with  the  dislocation  is  so  high  that  the  particles  are  no  longer  sheared 
but  rather  are  bypassed  by  the  dislocations.  The  dislocations  can  no  longer  shear  through  the  particles  so 
therefore  will  bypass  or  may  even  loop  the  precipitates  via.  the  Orowan  bowing/looping  mechanism,  which 
requires  a  lower  shear  stress  than  that  required  to  shear  the  particles.  The  dislocations  will  bow  between  the 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


578 


particles  and  break  away  without  cutting  them.  Beyond  the  critical  particle  size  where  the  particle  are  no 
longer  sheared  by  the  dislocations  but  are  rather  bypassed  by  dislocation,  the  flow  stress  becomes  inversely 
proportional  to  the  inter-particle  separation.  When  the  precipitates  continue  growing  and  coarsening  at 
constant  volume  fraction  an  overaged  material  results,  the  inter-particle  separation  increases  and  the 
strength  decreases.  There  is  often  a  gradual  transition  between  the  shearing  and  looping  mechanisms  due  to 
the  random  distribution  in  sizes  of  particles  and  this  transition  between  mechanisms  gives  the  smooth  peak 
of  the  single-stage  aging  curve.  Thus  the  influence  of  both  particle  shearing  and  dislocation  looping  or 
bypassing  results  in  the  classical  aging  curve,  with  strength  initially  increases  with  increasing  particle  size 
followed  by  the  strength  decreasing  with  aging  as  a  result  of  dislocation  particle  bypassing  or  looping. 

The  strength  of  a  precipitation  hardened  alloy  can  be  determined  in  terms  of  the  single  crystal  strength, 
often  referred  to  as  the  critical  resolved  shear  strength  (CRSS),  which  represents  the  strength  of  a  single 
grain  or  crystal  of  the  polycrystalline  aggregate.  The  CRSS  is  also  controlled  by  the  stress  necessary  for 
dislocations  to  glide  freely  through  the  matrix  of  precipitate  particles  of  the  single  ciystal.  The  single  CRSS 
can  be  correlated  with  the  yield  strength  of  the  polycrystalline  alloy.  However,  a  direct  comparison  between 
the  theoretically  determined  strength  and  the  experimentally  determined  strength  can  be  difficult  since  the 
experimentally  determined  strength  is  usually  in  terms  of  polycrystalline  behavior.  Additional  variables 
must  be  considered  when  considering  a  polycrystalline  alloy  as  compared  to  a  single  crystal  such  texture, 
and  grain  boundary  effects. 

The  design  approach/model  was  demonstrated  with  a  precipitation  strengthened  alloy  with  a  uniform 
microstructure  with  one  primaty  strengthening  particle  phase  considering  the  mechanical  property  of  yield 
strength  as  a  function  of  the  thermal  heat  treatment  conditions  and  composition.  The  precipitation  hardened 
alloy  considered  was  a  particle  strengthened  ternary  aluminum-lithium-zirconium  alloy  containing  8' 
particles  as  the  strengthening  phase.  The  general  approach  was  successfully  demonstrated  with  this  alloy 
for  the  mechanical  strength,  so  other  mechanical  properties  and  other  alloy  systems  with  more  complex 
microstructure  can  be  modeled.  Thus  the  purpose  of  this  study  was  to  provide  the  foundation  for  the  future 
research  with  other  more  complex  alloys  and  properties.  The  demonstration  Al-Li-Zr  alloy,  containing 
approximately  2.6  wt.%  lithium  and  0.09  wt.%  zirconium,  was  solution  heat  treated  and  artificially  aged  for 
various  aging  conditions  (times  and  temperatures)  to  determine  the  age  hardening  curves  at  different 
temperatures.  For  this  alloy,  the  primary  microstructural  strengthening  contribution  comes  from  8'  (Al3Li) 
particles  that  are  uniformly  distributed  throughout  the  microstructure.  These  particles  act  as  obstacles  and 
impede  the  dislocation  glide  motion  during  plastic  flow  deformation.  The  8'  particle  size,  distribution,  and 
spacing,  which  are  directly  effected  by  the  material  processing  and  composition,  are  directly  responsible  for 
the  strength  levels  achieved. 

The  aluminum-lithium  alloy  system  has  several  possible  interaction  mechanisms  whereby  coherent, 
ordered  8'  precipitates,  which  have  the  Cu3Au  (Ll2)  crystal  structure,  can  impede  dislocation  motion. 
Various  deformation  mechanisms  occur  from  glide  motion  of  dislocations  during  plastic  flow.  When 
dislocations  interact  with  precipitates,  the  mechanisms  which  may  contribute  to  the  strengthening  include 
coherency  strengthening,  chemical  strengthening,  modulus  strengthening,  order  strengthening,  stacking 
fault  strengthening,  strengthening  from  friction  stresses  in  the  8'  precipitates,  and  Orowan  strengthening. 
Order  hardening  and  Orowan  hardening  are  the  important  controlling  strengthening  mechanisms  with 
respect  to  Al-Li  alloys,  as  well  as  Ni-Al  alloys  which  are  strengthened  by  the  ordered  y’  phase  (Ni  3A1, 
Ni3Ti  or  Ni3Al,Ti).  For  the  latter  of  these  mechanisms,  the  dislocations  bypass  the  precipitates,  whereas  for 
the  others  the  dislocations  shear  the  precipitates.  Thus,  the  approach  taken  will  be  to  relate  the 
strengthening  of  the  material  through  dislocation  particle  shearing  or  dislocation  particle  looping.  With 
respect  to  particle  shearing  the  strength  is  roughly  proportional  to  the  average  particle  size  squared  and  with 
respect  to  particle  looping  the  strength  is  inversely  proportional  to  the  average  particle  radius. 

The  material  processing  activates  and  controls  the  precipitation  process  which  is  responsible  for  the 
microstructural  development.  The  analytical  models  which  describe  the  particle  strengthening  of 
precipitation  hardened  alloy  systems  are  usually  derived  in  terms  of  the  single  crystal  strength  i.e.,  the 
critical  resolved  shear  strength  (CRSS).  The  CRSS  provides  the  basis  for  predicting  the  polycrystalline 
yield  strength.  Experimental  tensile  data  for  the  demonstration  alloy  was  experimentally  determined 


579 


through  performing  a  series  of  tensile  tests  at  various  aging  conditions,  for  the  underaged,  peak-aged,  and 
overaged  heat  treatments,  with  variations  in  both  aging  temperature  and  aging  time.  Extrusion  processing 
variables,  such  as  extrusion  temperature,  extrusion  ratio,  and  extrusion  geometry,  were  also  varied  as  part 
of  this  study.  However,  the  extrusion  ratio  and  extrusion  temperature  were  found  to  have  a  negligibly  small 
effect  on  the  mechanical  strength.  Thus,  an  extensive  experimental  database  of  mechanical  properties  was 
available  to  assist  in  the  development  of  the  iterative  material  design  micromechanical  approach. 


EXPERIMENTAL  METHODS 

Materials  and  Processing 

The  demonstration  material  that  was  used  for  this  research  was  an  Al-2.6wt.  percent  lithium-0.09  wt.% 
zirconium  alloy.  The  complete  composition  analysis  (see  Table  1)  was  performed  by  the  Aluminum 
Company  of  America  (ALCOA)  using  optical  emission  spectrometric  analysis.  One  large  ingot, 
approximately  2250  kg.,  was  cast  by  the  ALCOA  laboratories  due  to  the  difficulty  in  reproducibility  of 
casting  several  small  ingots.  The  casting  was  rolled  into  a  slab  having  dimensions  of  30.5  cm.  (12  in.)  X  96 
cm.  (38  in.)  X  30.5  cm.  (12  in.).  The  ingot  was  later  preheated  in  a  gas  fired  furnace  for  a  total  time  of  20 
hours.  The  first  8  hours  was  in  a  furnace  temperature  range  of  482-50CP  C  (900-925°  F)  and  the  last  12 
hours  in  a  furnace  temperature  range  of  527-538°  C  (980-1000°  F).  Several  billets  were  then  machined  from 
the  preheated  ingot,  having  the  dimensions  of  15.25  cm.  (6  in.)  in  diameter  and  25.4  cm.  (10  in.)  or  50.8 
cm.  (20  in.)  in  length,  to  be  used  for  the  extrusion  processing  of  the  demonstration  alloy.  The  aluminum- 
lithium-zirconium  billets  were  direct  extruded  after  being  reheated  to  temperatures  of  approximately  either 
466°  C  (870°  F)  or  290°  C  (555°  F).  Six  product  geometries  were  extruded  from  the  billets  using  an 
instrumented  2500  ton  press  in  the  direct  mode.  The  product  geometry  used  for  this  investigation  was  1.91 
cm.  (0.75  in.)  diameter  round  extruded  rod  in  the  longitudinal  grain  direction.  The  extrusion  ratio  was  73:1 
and  the  corresponding  aspect  ratio  was  1:1  for  the  round  rod  geometry.  An  extrusion  temperature  of 
approximately  339°  C  (642°  F  )  was  used  for  the  Al-Li-Zr  demonstration  alloy. 

Heat  Treatment 

The  specimens  were  first  solution  heat  treated  for  one  hour  at  550  °C  in  a  molten  sodium  nitrate  salt 
solution  followed  by  a  cold  water  quench.  Following  the  solution  heat  treatment,  the  specimens  were 
artificially  aged  for  various  lengths  of  time  in  a  molten  sodium  nitrate  (NaNOj)  salt  bath.  The  specimens 
were  immediately  quenched  in  cold  water  after  the  aging  treatment.  The  specimens  were  artificially  aged 
for  a  range  of  aging  times  from  0  to  225  hours  at  temperatures  of  1 85  °C,  1 93  °C,  and  200  °C. 

Mechanical  Property  Characterization 

The  experimentally  determined  values  for  the  tensile  properties  were  obtained  from  mechanically  testing 
the  heat  treated  tensile  specimens.  The  tensile  specimens  were  tested  and  machined  from  the  extruded 
product  according  to  the  procedures  outlined  in  the  American  Society  for  Testing  and  Materials  ASTM 
standards  B557M-84  [1  ].  Full  age-hardening  curves  were  determined  for  the  Al-Li  demonstration  alloy. 

Metallographic  Examination 

Transmission  electron  microscopy  (TEM)  techniques  were  used  to  examine  and  photograph  the 
microstructure  from  thin  foils  obtained  from  samples  aged  at  1 85  °C  for  different  aging  times  ranging  from 
24  hours  to  225  hours.  The  samples  were  prepared  from  1.91  cm  (0.75  in.)  diameter  round  rod  (73:0 
extrusion  ratio)  material  extruded  with  an  extrusion  temperature  of  339°  C.  Disks  approximately  3  mm  in 
diameter  were  obtained  from  the  tin  foil  specimens.  The  thin  foil  disks  were  then  electropolished  using  a 
twin  jet  polisher,  with  samples  submerged  in  a  3 : 1  methanol-nitric  acid  solution  (the  electrolyte)  and  cooled 
by  liquid  nitrogen  to  around  -20  to  -35°C.  TEM  negatives  were  taken  directly  from  the  thin  foils. 

Quantitative  Measurements 

Centered  dark  field  images  were  used  since  they  gave  good  contrast  between  images  of  the  8’ particles  and 
the  matrix  phase.  Particle  size  measurements  of  both  the  Al3Li  precipitates  and  composite  Al3Li-Al3Zr 
precipitates  were  performed  directly  from  TEM  negatives.  A  semiautomatic  eyecom  II  image  analyzing 
system  was  used  to  measure  the  particle  sizes.  The  average  particle  size  was  measured  for  each  aging  time. 
Particle  size  distributions  of  over  500  particles  were  constructed  for  each  aging  time.  Two  particle 


580 


diameters  were  measured  for  each  particle  in  order  to  determine  as  aspect  ratio  for  each  particle.  The  aspect 
ratios  were  used  to  quantitatively  describe  the  spherical  morphology  of  the  particle  size  distributions 


OVERALL  MATERIALS  DESIGN  APPROACH 

The  microstructure  provides  the  medium  through  which  the  mechanical  properties  are  related  to  the 
material  processing  and  composition.  Specifically,  the  method  taken  to  perform  this  research  was  to  first 
evaluate  and  determine  the  particle  size  and  size  distribution  that  would  be  required  to  achieve  a  required 
strengthening  contribution.  The  particle  size  distribution  and  average  particle  size  define  the  microstructure 
of  the  demonstration  alloy.  The  microstructure  would  then  be  used  as  the  basis  to  determine  the 
composition  and  heat  treatment  processing  variables.  With  respect  to  the  microstructure,  the  particle  size 
distribution  was  determined  from  the  Lifshitz-Slyozov-Wagner  (LSW)  coarsening  theory  [2,3],  in  order  to 
determine  the  growth  rate  of  particles,  average  particle  size  of  the  distribution,  and  critical  radii  at  the  onset 
of  coarsening.  An  iterative  approach  was  taken  to  assist  in  evaluating  the  particle  size  and  distribution  from 
the  mechanical  behavior,  and  the  material  processing  and  composition  from  the  microstructure.  Based  on 
the  LSW  coarsening  theory  for  constant  volume  fraction  of  precipitates,  the  aging  time  can  be  determined 
based  on  the  average  8’particle  size  for  a  given  distribution  and  is  given  by: 

t  =  r3/Kc  1. 

where  t  is  the  aging  time,  Kc  is  the  particle  growth  rate  constant,  and  r  is  the  average  8’particle  size.  The 
average  particle  size  and  growth  rate  constant  for  a  give  alloy  system  can  be  evaluated  from  quantitative 
measurements  using  quantitative  microscopy  from  TEM  analysis.  For  the  Al-Li  demonstration  alloy,  Kc 
was  equal  to  35.1E-24  cm3/sec  based  on  quantitative  measurements  of  particle  sizes.  Thus,  the  heat 
treatment  time  or  artificial  aging  time,  can  be  determined  from  equation  1 .  The  heat  treatment  temperature 
can  be  related  to  the  composition  (wt.%  Li)  of  the  Al-Li  alloy  through  the  expressions  given  by  [4,5]: 

ln{  Kct }  =  -12946.16/T  +  a  2. 

where  T  is  the  aging  temperature,  is  the  Kc  coarsening  growth  rate,  t  is  the  aging  time,  and  where  “a”  is  an 
empirical  constant  for  the  Al-Li  alloy  system  dependent  on  the  composition  and  given  by  expression  [4,5]: 

a  =  0.52  (wt.%  Li) -  19.97  3. 

The  material  processing  and  composition  can  be  used  to  directly  evaluate/calculate  the  strength,  from  a 
previously  developed  model,  to  see  how  well  it  compares  with  the  actual  experimental  value  for  the 
strength  that  was  used  to  determine  the  first  estimate  of  the  material  processing  and  composition.  In  an 
alloy  system  such  as  aluminum-lithium  which  contains  shearable  particles,  the  single  crystal  critical 
resolved  shear  stress,  Tcrss>  from  the  interaction  of  gliding  dislocations  with  the  ordered  coherent 
precipitates  can  represented  by  an  expression  of  the  form  [6], 


where  Tcrss  is  the  single  crystal  strength,  r  is  the  average  particle  size,  f  is  the  volume  fraction  of  the 
strengthening  precipitates,  and  *t”  is  strengthening  constant  for  the  particular  alloy.  Based  on  the  LSW 
theory  with  constant  volume  fraction  of  precipitates,  the  overall  mechanical  strength  of  a  precipitation 
strengthened  alloy  can  be  thus  directly  related  to  the  average  particle  size  from  the  expression  given  by: 

o0  =  asss  +  Cr0'5  5. 

where  a0  is  the  proof  stress  or  the  total  mechanical  strength  of  the  material,  asss  is  the  as-quenched  strength 
which  includes  the  contributions  from  the  solid  solution,  grain  size,  and  intrinsic  lattice  strengthening.  “C” 
is  a  materials  constant  dependent  on  the  alloy  strengthening,  crystallographic  texture,  and  microstructure, 
and  r  is  the  average  particle  size  of  the  intermetallic  strengthening  precipitates  in  the  microstructure  i.e.,  8’ 
for  the  Al-Li  alloy.  The  constant  C  includes  various  material  parameters/constants  such  as  the  antiphase 
bounday  energy,  the  shear  modulus,  volume  fraction,  Burgers  vector,  the  Taylor  grain  orientation  texture 
factor,  etc.  For  the  Al-2.6Li-0.09Zr  alloy  used  in  this  study  the  as-quenched  strength,  ,cSSSj  was  found  to  be 
approximately  140.6  MPa  for  the  solution  heat  treated  only  condition.  The  calibration  constant  “C”  = 
76,739,877.3  was  determined  from  fitting  a  yield  tensile  data  point  in  the  underaged  condition  (4  hours  at 
1 85°C)  to  the  above  expressions.  Thus,  if  the  yield  strength  is  known  in  MPa,  the  average  particle  size  in 


581 


Angstroms  (1  Angstrom  =  10'10  meters)  can  be  estimated  from  equation  4.,  and  thus  the  heat  treatment 
aging  time  can  then  be  estimated  from  equation  1 .  In  the  peak-aged  condition  (48  hours  at  1 85  °C)  the  yield 
strength  was  found  to  be  approximately  449.4  MPa.  Using  the  calibration  constant  and  the  as-quenched 
strength,  the  average  particle  size  was  calculated  as  16.1E-11  meters  (16.1  angstroms).  The  aging 
temperature  and  composition  (wt.%  lithium)  can  then  be  optimized  from  equations  2.  and  3.  based  on 
different  average  particle  sizes  corresponding  to  different  heat  treatment  aging  times  and  temperatures. 


DISCUSSION 

The  microstructure  provides  the  medium  through  which  the  mechanical  properties  are  related  to  the 
material  processing  and  composition.  Specifically,  the  method  to  be  taken  to  perform  this  research  will  be 
to  first  evaluate  and  determine  the  particle  size  and  size  distribution  that  would  be  required  to  achieve  a 
required  strengthening  contribution.  The  particle  size  distribution  and  average  particle  size  would  define 
the  microstructure  of  the  demonstration  alloy.  The  microstructure  would  then  be  used  as  the  basis  to 
determine  the  composition  and  heat  treatment  processing  variables.  With  respect  to  the  microstructure,  the 
particle  size  distribution  can  be  determined  from  the  Li fshitz-Slyozov- Wagner  coarsening  theory,  in  order 
to  determine  the  growth  rate  of  particles,  average  particle  size  of  the  distribution,  and  critical  radii  at  the 
onset  of  coarsening.  Figure  1  shows  a  computer  generated  microstructure  for  the  Al-Li  alloy  which 
provided  a  basis  for  the  developing  the  overall  materials  design  methodology.  An  iterative  approach  was 
taken  to  assist  in  evaluating  the  particle  size  and  distribution  from  the  mechanical  behavior,  and  the 
material  processing  and  composition  from  the  microstructure.  The  material  processing  and  composition 
can  then  be  used  to  directly  evaluate/calculate  the  strength,  from  a  previously  developed  model,  to  see  how 
well  it  compares  with  the  actual  experimental  value  for  the  strength  that  was  used  to  determine  the  first 
estimate  of  the  material  processing  and  composition.  The  process  can  be  iterated  a  second  time  by  making 
small  adjustments  in  the  composition  and/or  heat  treatment  processing  (aging  temp,  and  aging  time)  which 
would  translate  in  some  adjustment  to  the  microstructure  through  a  change  average  particle  size  and  size 
distribution.  For  example,  a  slight  increase  in  the  composition  of  the  minor  alloying  element  of  lithium, 
say  from  2.0  to  2.2  wt.%,  and/or  a  small  increase  in  the  aging  temperature,  say  from  18CPC  to  183°C,  would 
result  in  a  increase  in  the  average  particle  size  and  distribution  which  would  most  likely  translate  to  a  some 
increase  in  the  mechanical  strength,  unless  the  alloy  was  already  in  the  overaged  condition  which  would 
then  continue  to  decrease  in  mechanical  strength  .  If  necessary,  a  third  or  more  iteration  could  be  done  to 
improve  the  computations.  The  overall  approach  could  be  integrated  into  a  computer  model  for  materials 
design  to  assist  in  the  calculations.  After  the  materials  design  methodology  has  been  applied  for  the 
mechanical  strength  property  it  can  then  be  applied  to  the  other  properties,  such  as  fatigue  life,  fracture 
toughness,  ductility,  creep,  etc.,  of  the  overall  materials  design  approach. 

For  many  alloys,  the  macrostructural  grain  size  and  distribution  is  an  important  parameter  of  the 
microstructure  with  respect  to  controlling  the  mechanical  behavior.  The  grain  size  and  distribution,  as 
opposed  to  the  particle  size  and  distribution  with  respect  to  the  Al-Li  demonstration  alloy,  is  considerably 
important  in  determining  the  mechanical  strength  in  several  alloy  systems.  However,  with  respect  to  the  Al- 
Li  alloys,  the  grain  size  strengthening  effect  on  the  yield  strength  is  very  small,  and  thus  the  particle 
strengthening,  as  previously  described,  is  that  which  controls  the  polycrystalline  yield  strength  of  a 
precipitation  hardened  alloy.  The  Hall-Petch  coefficients  for  Al-Li  alloys  are  very  small,  indicating  the 
small  grain  size  strengthening  contribution  in  Al-Li.  However,  in  alloys  where  the  Hall-Petch  coefficients 
are  not  small,  the  grain  size  strengthening  is  not  small  and  is  useful  in  evaluating  the  strengthening  through 
the  microstructure  via.  an  accurate  determination  of  the  size  and  distribution  of  the  grains.  The  approach  of 
the  micromechanical  model  with  respect  to  the  grain  size  and  distribution  would  be  the  same  as  that  with 
the  particle  size  and  distribution  in  that  the  mechanical  behavior  can  be  related  to  the  microstructure 
through  either  the  particle  size  distribution  and/or  the  grain  size  distribution  on  the  microstructural  and 
macrostructural  levels  respectively.  In  some  alloys  both  the  grain  size  and  distribution  as  well  as  the 
distribution  of  precipitate  particles  within  the  individual  grains  would  be  important  microstructural 
parameters  in  relating  the  material  processing,  composition,  and  heat  treatment  to  the  mechanical  behavior 
via.  the  microstructure,  and  thus  both  would  need  to  be  considered. 

In  general,  grain  size  and  the  corresponding  spatial  distribution  of  grains  boundaries  in  polycrystalline 
materials,  have  an  important  effect  for  some  materials  on  many  physical  phenomena  in  physical 


582 


metallurgy.  In  addition  to  mechanical  strength,  properties  such  as  ductility,  fracture  toughness,  creep 
resistance,  fatigue,  castability,  superplasticity  etc.  are  all  influenced  by  spatial  grain  size  distribution. 
Various  analytical  methods  exist  for  calculating  the  size  distribution  of  grains.  These  methods  are  classified 
into  several  major  categories,  according  to  the  type  of  planar  measurements  performed.  These  methods  are 
based  on  measurements  of  section  diameters,  section  areas,  section  chords,  and  intercept  length.  The 
distribution  of  grain  sizes  on  a  given  microstructure  area  plane  can  be  determined  and  this  can  be  converted 
to  a  volume  distribution  of  grain  sizes.  The  volume  distribution  of  grain  sizes  in  an  alloy  is  a  more  accurate 
representation  of  the  internal  structure  and  this  can  be  used  as  a  basis  to  evaluate  and  determine  the 
mechanical  behavior  and  various  properties  of  materials. 


SUMMARY  AND  CONCLUSIONS 

The  processing  variables  included  the  alloy’s  overall  heat  treatment,  which  involves  the  aging  practice 
(time  and  temperature)  and  the  solution  heat  treatment  practice,  and  also  the  manufacturing  processing  of 
the  alloy,  such  as  the  extrusion  processing,  or  some  other  type  of  mechanical  processing  method.  The 
mechanical  property  of  interest  included  the  mechanical  tensile  strength,  or  yield  strength,  but  some  other 
mechanical  properties  of  interest  are  necessary  as  part  of  the  overall  approach  such  as  ductility,  fracture 
toughness,  and  fatigue  behavior.  Thus,  this  particular  study  is  a  summary  of  work  current  in  progress  on  the 
development  of  an  overall  global  design  procedure  to  determine  thermal  and  mechanical  processing 
parameters,  and  alloy  composition  required  for  specific  mechanical  design  properties.  This  communication 
presents  the  progress  of  this  effort  for  the  mechanical  property  of  yield  strength.  Specifically,  a  materials 
design  model  was  designed  to  predict  microstructure  from  mechanical  behavior  as  the  basis  for  prediction 
and/or  specification  of  the  thermal  processing  parameters.  An  iterative  approach  was  taken  to  improve  the 
initial  determination  of  material  processing  and  composition  to  yield  the  correct  value  of  the  tensile 
strength.  The  overall  approach  will  design  a  precipitation  hardened  material  that  can  perform  according  to 
the  design  requirements  and  processing  capabilities  of  a  given  thermal  treating  and  manufacturing  facility. 


REFERENCES 

1 .  1 988  Annual  Book  of  ASTM  Standards,  Section  3 :  Metals  Test  Methods  and  Analytical  Procedures, 
Volume  3.01,  American  Society  for  Testing  and  Materials,  Philadelphia,  PA,  1988. 

2.  I.M.  Lifshitz,  and  V.V.  Slyozov,  1961,  ‘The  Kinetics  of  Precipitation  from  Supersaturated  Solid 
Solutions”,  Journal  Physical  Chemical  Solids,  19,  35-50. 

3.  C.  Wagner,  1931,  ‘Theories  Associated  with  Age  Hardening  and  Overaging  During  Ostwald 
Ripening”  Zeitschrift  fur  Elektrochemie,  37,  581-591. 

4.  K.  Mahalingam,  B.P.  Gu,  G.L.  Leidl,  and  T.H.  Sanders,  Jr.,  1987,  ‘The  Coarsening  of  5’(A1 3Li) 
Precipitates  in  Binary  Al-Li  Alloys”,  Acta  Metallurgicia,  35(2),  483-498. 

5.  S.C.  Jha,  G.  Liedl,  K.  Mahaligham,  and  T.H.  Sanders,  Jr.,  1986,  “Coarsening  Phenomenon  on 
Aluminum-Lithium  Alloys”,  in  Unusual  Techniques  and  New  Applications  of  Metallography,  Vol.  24, 
edited  by  R.J.  Gray  and  L.R.  Cornwell,  ASM,  Metals  Park,  Ohio. 

6.  E.A.  Starke,  Jr.,  1977,  “Aluminum  Alloys  of  the  70k:  Scientific  Solutions  to  Engineering  Problems”, 
Materials  Science  and  Engineering,  29,  99-115. 


Table  1:  Composition  analysis  determined  by  optical  emission  spectrometric  analysis  for  the 
Al-2.6wt.%Li-0.09wt.%Zr  demonstration  alloy. 


A1 

Li 

Zr 

Cu 

Mg 

Si 

Fe 

Ti 

B 

Na 

Ca 

bal 

2.59 

0.09 

0.11 

0.07 

0.04 

0.03 

0.01 

<0.001 

<0.001 

<0.001 

583 


Command  Zoom 


Aging  Time 
Aging  Temperature 
Growth  Rate  Constant 
Average  Particle  Radius 
Weight  Percent  of  Lithium 
Precipitate  Free  Zone  Width 


Volume  Fraction  of  Precipitate  = 


=  148.00  hours 

s  200.00  deg.  Celsius 

=  0.36E-22  cuT 3/sec 

*  267.99  Angstroms 

=  2. BO  X 

9415.65  Angstroms 


0.21 


Fig.  1.  Computer  simulated  TEM  microstructure  for  a  binary  aluminum-lithium  (Al-2.8wt.%Li) 
alloy  precipitation  age  hardened  for  148  hours  at  200  °C,  showing  the  distribution 
of  Al3Li  (8)  precipitates  in  the  matrix  phase  of  the  overall  microstructure. 


584 


585 


Statistical  Approach  to  Experimental  Design  to 
Determine  the  Effect  of  Extrusion  Variables  on  the 
Mechanical  Properties  of  an  Al-Li  Alloy 

J.  M.  Fragomeni 

Department  of  Mechanical  Engineering,  Stocker  Engineering  Center, 
Ohio  University,  Athens,  Ohio,  USA 


ABSTRACT 

The  objective  of  this  research  study  was  to  utilize  the  statistical  design  of  experiments  method  to  analyze 
and  evaluate  the  influence  of  the  extrusion  process  manufacturing  parameters  on  some  given  mechanical 
properties  for  an  aluminum-lithium  alloy  with  a  given  composition  and  post-extrusion  thermal  processing. 
The  goal  was  to  analyze  and  determine  if  variations  in  specific  extrusion  processing  variables  or 
parameters  had  a  statistically  significant  effect  on  some  given  mechanical  properties  such  as  yield  strength, 
ultimate  tensile  strength,  and  ductility.  The  layout  for  the  design  experiment  consisted  of  two  extrusion 
temperatures  and  two  extrusion  geometries.  The  layout  also  included  six  cross  section  areas  nested  in  two 
extrusion  geometries.  Thus,  the  design  of  the  experiment  was  a  mixture  of  a  nested  structure  and  a  factorial 
structure  i.e.,  a  nested  factorial  design.  This  design  included  interactions  crossed  between  the  extrusion 
temperature  and  the  extrusion  geometry,  and  the  extrusion  temperature  with  the  cross  sectional  area.  Thus 
there  was  a  possibility  of  twelve  different  treatment  combinations  that  made  up  the  design  of  experiments 
method.  It  was  found  that  the  extrusion  geometry  was  at  least  95%  statistically  significant  with  respect  to 
the  mechanical  properties  of  interest  and  that  the  extrusion  temperature  was  not  significant  at  this  level. 


INTRODUCTION 

The  emphasis  of  this  investigation  was  to  evaluate  what  particular  extrusion  processing  parameters  from  a 
direct  extrusion  manufacturing  process  had  a  statistically  significant  effect  on  mechanical  behavior.  This 
was  considered  important  since  one  may  desire  to  control  the  mechanical  properties  through  a 
manufacturing  extrusion  process  for  particular  design  applications.  Direct  extrusion  is  a  manufacturing 
process  whereby  metal  or  alloy  in  a  billet  form  is  forced  through  a  die  under  high  pressure  usually  at 
elevated  temperature.  The  billet  is  often  at  elevated  temperature  since  the  deformation  resistance  is  low  and 
therefore  less  force  and  energy  are  required  to  plastically  deform  the  billet  through  the  die  orifice  by  a  ram 
with  a  dummy  block  or  pressure  plate  at  the  end  of  the  ram  in  direct  contact  with  the  billet.  The  extrusion 
of  a  material  is  a  complex  process  that  involves  the  interaction  of  various  extrusion  processing  variables  to 
change  the  shape  of  the  starting  billet  material  and  thus  substantially  change  the  internal  structure  and 
properties  of  the  material.  The  most  common  methods  of  extrusion  include  direct  extrusion,  indirect 
extrusion,  and  hydrostatic  extrusion.  The  work  presented  in  this  investigation  involves  direct  extrusion  at 
elevated  temperatures  of  an  aluminum  alloy  billet  containing  lithium  that  was  extruded  into  both  round  and 
rectangular  cross-sectional  geometries  by  the  Aluminum  Company  of  America.  Direct  extrusion  is 
characterized  by  the  fact  that  the  process  occurs  in  a  solid  container  with  the  extruded  product  exiting  in 
the  same  direction  as  the  ram  that  is  causing  the  deformation  is  moving.  The  relative  motion  between  the 
billet  and  the  container  wall  will  cause  heat  to  be  generated  during  direct  extrusion,  and  also  some  plastic 
deformation.  Thus  the  exit  temperature  of  the  extruded  product  will  often  be  greater  than  the  initial 
temperature  of  billet  prior  to  extrusion.  The  rise  in  temperature  causes  the  variations  in  the  temperature  of 
material  perpendicular  to  and  transverse  to  the  ram  travel.  Transverse  variations  in  temperature  produce 
variations  in  structure,  and  hence  in  properties,  across  to  extruded  geometry.  Therefore,  in  order  to 
accurately  represent  the  extrusion  temperature  for  the  given  extrusion  process,  and  average  equivalent 
extrusion  temperature,  developed  by  Farag  and  Sellers  [1],  is  often  used  and  expressed  as 

Teq  =  (2  T0  Tf  )/(T0  +  Tf)  (1) 


0-7803-5489-3/99/$  1 0.00  ©1999  IEEE. 


586 


Where  T0  is  the  initial  billet  temperature,  Tf  is  the  final  billet  temperature,  and  Teq  is  the  average  equivalent 
extrusion  temperature.  Thus  the  general  rise  in  temperature  during  extrusion  can  cause  variations  in  the 
internal  structure  and  properties  of  the  extruded  product. 


EXPERIMENTAL  METHODS 

Material  Processing 

An  aluminum-lithium-zirconium  alloy  having  a  composition  of  2.6wt.%Li  and  0.09wt.%Zr  (see  Table  1) 
was  used  as  the  demonstration  alloy  for  this  investigation.  The  complete  composition  analysis  was 
performed  by  the  Aluminum  Company  of  America  (ALCOA)  using  optical  emission  spectrometric 
analysis.  One  large  ingot  (2250  kg.)  was  cast  by  the  ALCOA  laboratories  due  to  the  difficulty  in 
reproducibility  of  casting  several  small  ingots.  The  casting  was  rolled  into  a  slab  having  dimensions  of 
30.5  cm.  (12  in.)  X  96  cm.  (38  in.)  X  30.5  cm.  (12  in.).  The  ingot  was  later  preheated  in  a  gas  fired  furnace 
for  a  total  time  of  20  hours.  The  first  8  hours  was  in  a  furnace  temperature  range  of  482-50CP  C  (900-925° 
F)  and  the  last  12  hours  in  a  furnace  temperature  range  of  527-538*  C  (980-1000°  F).  Several  billets  were 
then  machined  from  the  preheated  ingot,  having  the  dimensions  of  15.25  cm.  (6  in.)  in  diameter  and  25.4 
cm.  (10  in.)  or  50.8  cm.  (20  in.)  in  length,  to  be  used  for  the  extrusion  processing  of  the  Al-Li-Zr 
demonstration  alloy. 

Extrusion  Processing 

The  aluminum-lithium-zirconium  billets  were  direct  extruded  after  being  reheated  to  temperatures  of 
approximately  either  466°  C  (870°  F)  or  290°  C  (555°  F).  Six  product  geometries  were  extruded  from  the 
billets  using  an  instrumented  2500  ton  press  in  the  direct  mode.  The  product  geometries  used  for  this 
investigation  were  1.91  cm.  (0.75  in.),  2.69  cm.  (1.06  in.),  and  5.3  cm.  (2.09  in.)  diameter  round  extruded 
rod  in  the  longitudinal  grain  direction.  The  extrusion  ratios  were  73:1,  36.5:1,  and  9:1  and  the 
corresponding  aspect  ratio  was  1:1  for  the  round  rod  geometry.  An  extrusion  temperature  of  approximately 
339“  C  (642°  F )  was  used  for  the  Al-Li-Zr  demonstration  alloy. 

Extrusion  Post-Processing 

The  Al-Li  alloy  was  machined  into  tensile  samples  from  the  extruded  product.  The  tensile  samples  were 
oriented  in  the  longitudinal  grain  direction.  The  tensile  samples  were  first  solution  heat  treated  (SHT)  for 
one  hour  at  550°  C  (1022°  F)  in  a  molten  sodium  nitrate  salt  solution  followed  by  a  cold  water  quench 
(CWQ)  to  room  temperature.  Following  the  solution  heat  treatment,  the  tensile  samples  were  artificially 
aged  for  various  lengths  of  time  in  a  molten  sodium  nitrate  (NaN03 )  salt  bath.  Different  aging  treatments 
were  utilized  by  varying  both  the  time  and  the  temperature.  The  samples  were  immediately  quenched  in 
cold  water,  at  approximately  room  temperature,  after  the  artificial  aging  treatment.  The  samples  were  aged 
at  temperatures  of  185°  C  (365°  F)  and  193°  C  (379°  F).  The  molten  salt  solution  was  continuously  stirred 
throughout  the  solution  heat  treatment  and  aging  process  to  insure  a  uniform  temperature  distribution 
throughout  the  bath. 

Monotonic  Tensile  Tests 

The  experimentally  determined  values  for  the  tensile  properties  along  the  longitudinal  direction  were 
obtained  from  mechanically  testing  the  heat-treated  tensile  samples.  Tensile  testing  was  performed  in 
accordance  with  the  American  Society  for  Testing  and  Materials  (ASTM)  for  the  ASTM  B557M  standards 
[2]  test  specifications.  All  the  tensile  testing  was  performed  at  room  temperature  with  the  test  machine 
operating  in  stroke  control.  The  mechanical  testing  was  performed  utilizing  a  +/-22  kip  (100  KN)  MTS 
System  Corporation  electrohydraulic  testing  system.  For  the  purposes  of  this  investigation,  round  rod 
tensile  samples  were  machined  from  the  round  geometry  extruded  product  in  the  longitudinal  grain 
direction.  The  tensile  samples  were  tested  in  the  longitudinal  orientation.  It  was  determined  from  an 
extensive  tensile  database  of  the  demonstration  alloy  that  the  tensile  strength  data  was  reproducible  with  no 
statistical  variation  in  strength  for  constant  aging  conditions. 

Transmission  Electron  Microscopy  and  Particle  Size  Distribution  Statistics 

The  particle  size  distribution  and  particle  morphology  were  examined  and  photographed  using  transmission 
electron  microscopy  (TEM)  from  thin  foil  specimen  obtained  from  samples  aged  at  1 85°  C  for  different 


587 


aging  times  ranging  from  24  hours  to  225  hours.  The  thin  foil  specimen  were  sliced  with  a  diamond  blade 
saw  cutter  and  than  polished  to  foils  approximately  0.05  mm  thick.  Disks  approximately  3  mm  in  diameter 
were  then  punched  from  the  thin  foils.  The  thin  foil  disks  were  then  electropolished  using  a  twin  jet 
polisher,  with  the  disks  submerged  in  a  3:1  methanol-nitric  acid  solution  cooled  by  liquid  nitrogen  to 
around  -20  to  -35°  C.  The  thin  foil  disks  were  observed  and  photographed  using  a  JEOL-200  CX 
microscope  operating  at  200  KV  for  various  specimen  inclinations.  Figure  1  shows  the  TEM 
microstructure  of  the  Al-Li  alloy  showing  the  8’  (A1 3Li)  precipitates  distributed  uniformly  in  the  matrix. 
Particle  size  measurements  of  the  Al3Li  precipitates  were  made  directly  from  TEM  negatives.  A 
semiautomatic  EyeCom  II  image  analyzing  system  was  used  to  measure  particle  sizes.  The  average  particle 
sizes  were  measured  for  each  aging  time.  In  order  to  perform  the  statistical  analysis  of  the  particle 
distributions,  particle  size  distributions  consisting  of  over  500  particles  were  constructed  analyzed  for  each 
heat  treated  aging  time.  The  coefficients  of  skewness,  kurtosis,  and  variation,  as  well  as  the  first,  second, 
third,  and  fourth  moments  about  the  mean  were  evaluated  for  each  of  the  particle  size  distributions. 


RESULTS 

As  shown  by  Table  2,  the  layout  for  the  experiment  consists  of  the  two  extrusion  temperatures  of  32(fC 
and  450°C,  and  two  extrusion  geometries.  The  layout  also  includes  the  six  cross  section  areas  nested  in  the 
two  extrusion  geometries.  Thus,  the  design  of  this  experiment  is  a  mixture  of  a  nested  structure  and  a 
factorial  structure  i.e.,  a  nested  factorial  design.  This  design  includes  interactions  crossed  between  the 
extrusion  temperature  and  extrusion  geometry  and  the  extrusion  temperature  with  the  cross  section.  The 
nested  factorial  design  was  chosen  over  a  completely  randomized  design,  a  randomized  complete  block 
design,  and  a  nested  a  design  since  it  best  describes  the  analysis  of  this  experiment.  From  the  layout  of  the 
experiment,  it  can  be  seen  that  there  are  a  possibility  of  twelve  different  treatment  combinations.  Once  the 
mechanical  testing  system  has  been  set  up  it  will  take  approximately  30  minutes  to  an  hour  to  perform  a 
single  test.  Therefore,  if  no  replications  are  performed  i.e.,  one  observation  per  cell  block,  then  the 
experiment  can  be  run  in  a  single  day.  If  more  than  one  observation  per  cell  block  was  desired  to  be  tested, 
then  the  entire  experiment  could  be  run  but  more  than  one  day  would  be  necessary.  The  specimen 
corresponding  to  the  different  treatment  combinations  would  be  assigned  random  numbers  one  through 
twelve.  The  specimen  designations  could  be  as  shown  by  Table  3.  Equation  2.  summarizes  the 
corresponding  model  for  the  design  of  experiments. 

Y  ijki  =  p.  +G;+  Sj(i)+Tk+  GTik  +TSjk(i)  +£(ijk)i  2. 

Where  Yyki  represents  the  given  mechanical  property  such  as  the  yield  strength,  elastic  modulus, 
elongation,  ultimate  tensile  strength,  etc.  p  represents  the  overall  mean,  G[  represents  the  effect  of  the  ith 
level  of  the  extrusion  geometry,  Sj(i)  represents  the  effect  of  the  j*  level  of  the  cross  section  on  the  ith  level 
of  the  extrusion  geometry,  Tk  represents  the  effect  of  the  kth  level  of  the  extrusion  temperature,  GTik 
represents  the  effect  if  the  interaction  of  the  ilh  level  of  extrusion  geometry  and  the  klh  level  of  the  extrusion 
temperature,  TSjk(i)  represents  the  effect  of  the  interaction  of  the  jth  level  of  the  cross  section  on  the  i*  level 
of  the  extrusion  geometry  and  the  k'h  level  of  the  extrusion  temperature,  and  e^i  represents  the  effect  of 
the  1th  sample  within  the  i*  extrusion  geometry,  the  j*  cross  section,  and  the  kth  extrusion  temperature.  The 
between  sample  terms  include  Gi  and  Sj(0  .  The  within  subject  terms  include  Tk,  GTjk,  TSjk®,  and  £(yk)i . 
Using  this  analysis,  the  two  extrusion  geometries  could  be  coded  by  allowing  G  to  be  equal  to  1  or  2.  The 
cross  section  could  then  be  coded  by  letting  S  equal  to  1,2,  or  3,  and  the  extrusion  temperature  could  be 
coded  by  letting  T  equal  to  1  or  2.  The  numbers  1  through  12  could  be  randomly  drawn  and  the 
corresponding  treatment  combinations  run  in  the  exact  order  that  it  was  drawn. 

Table  4  shows  the  ANOVA  for  the  fixed  model  which  would  apply  to  the  tensile  specimens  that  have  been 
mechanically  tested  to  date.  However,  a  future  design  experiment  could  be  performed  with  using  a  random 
model  where  all  the  extrusion  variables  would  be  random  factors.  This  is  because  the  mechanical  testing 
system  is  not  constrained  by  extrusion  geometry,  extrusion  temperature,  or  extrusion  cross  section 
dimensions.  The  ANOVA  table  for  the  random  model,  where  all  the  extrusion  variables  would  be  random 
factors,  is  shown  by  Table  5.  However,  Table  6  shows  the  layout  of  the  experimental  yield  strength  tensile 
data  for  the  Al-2.6wt.%Li-0.09wt.%Zr  demonstration  alloy  in  the  peak-aged  heat  treated  condition.  This 
layout  includes  experimental  tensile  data  for  the  two  extrusion  temperatures  and  the  six  cross  section  areas 
nested  within  the  two  extrusion  geometries.  The  two  missing  data  points  correspond  to  future  tensile  test. 


588 


DISCUSSION 

The  design  of  this  experiment  produced  an  interface  space  for  this  particular  alloy  as  well  as  other 
aluminum-lithium  alloys  with  varying  compositions  (different  weight  percents  lithium  and  aluminum). 
The  interface  space  stated  that  the  experimenter  would  like  to  imply  that  if  any  of  the  extrusion  processing 
variables  included  in  the  design  show  up  in  the  experiment,  then  the  same  effect  on  mechanical  properties 
would  show  up  in  all  tensile  specimen  that  were  obtained  from  the  given  ingot  of  material  from  which  the 
twelve  were  selected  as  well  as  other  cast  Al-Li  ingots  with  different  compositions.  All  of  the  mechanical 
testing  was  performed  with  the  extrusion  variables  fixed,  and  using  an  MTS  Systems  Corporation 
electrohydraulic  mechanical  testing  system.  The  mechanical  tensile  data,  including  both  the  yield  strength 
and  the  ultimate  tensile  strength,  the  percent  elongation,  and  the  elastic  modulus  was  analyzed  using  the 
analysis  of  variance  approach.  The  Student-Neuman  Keuls  multiple  range  test  was  used  to  analyze  the 
means  from  the  smallest  the  largest  to  determine  which  means  are  not  significantly  different.  All  of  the 
billets  used  for  the  extrusion  processing  were  obtained  from  the  same  ingot  to  avoid  a  possible  restriction 
error,  to  come  from  the  same  ingot  processing,  and  composition.  Thus  one  large  ingot  was  cast  instead  of 
several  smaller  ingots  for  better  reproducibility  of  the  chemical  composition  of  the  individual  billets.  Is 
addition,  billets  were  cut  from  the  center  of  the  ingot  for  better  chemical  homogeneity.  This  was  done  to 
avoid  any  possible  composition  variations  from  the  surface  to  center  of  the  ingot. 


SUMMARY  AND  CONCLUSIONS 

The  design  of  the  experiment  was  a  mixture  of  a  nested  structure  and  a  factorial  structure  i.e.,  a  nested 
factorial  design.  This  design  included  interactions  crossed  between  the  extrusion  temperature  and  the 
extrusion  geometry,  and  the  extrusion  temperature  with  the  cross  sectional  area.  Thus  there  was  a 
possibility  of  twelve  different  treatment  combinations  that  made  up  the  design  of  experiments  method.  It 
was  found  that  the  extrusion  geometry  was  at  least  95%  statistically  significant  with  respect  to  the 
mechanical  properties  of  interest  and  that  the  extrusion  temperature  was  not  significant  at  this  level. 

The  design  of  the  experiment  produced  an  interface  space  for  the  demonstration  alloy  as  well  as  other 
aluminum  alloys  containing  lithium  with  varying  compositions.  The  interface  space  proposed  that  the 
experimenter  would  desire  to  imply  that  if  any  of  the  extrusion  processing  variables  included  in  the  design 
showed  up  in  the  experiment,  then  the  same  effect  on  mechanical  properties  would  show  up  in  all  tensile 
samples  that  were  processed  from  the  given  ingot  of  material  from  which  the  twelve  were  selected  as  well 
as  other  cast  Al-Li  ingots  with  varying  compositions. 


REFERENCES 

1.  M.M.  Farag,  and  C.M.  Sellars,  1973,  ‘Flow  Stress  in  Hot  Extrusion  of  Commercial  Purity 
Aluminum”,  Journal  of  the  Institute  of  Metals,  101,  229-238. 

2.  1988  Annual  Book  of  ASTM  Standards,  Section  3:  Metals  test  Methods  and  Analytical  Procedures, 
Volume  3.01,  American  Society  for  Testing  and  Materials,  Philadelphia,  PA,  1988. 


Table  1:  Composition  analysis  in  weight  percent  determined  by  optical  emission  spectrometric 
analysis  for  the  aluminum-lithium-zirconium  demonstration  alloy. 


A1 

Li 

Zr 

Cu 

Mg 

Si 

Fe 

Ti 

B 

Na 

Ca 

bal 

2.59 

0.09 

0.11 

0.07 

0.04 

0.03 

0.01 

<0.001 

<0.001 

<0.001 

589 


Fig  1.  Transmission  electron  dark  field  image  micrograph  showing  the  TEM  microstructure  of  the  Al- 
2.6wt.%Li-0.09wt.%Zr  demonstration  alloy  heat  treated  aged  at  1 85  °C  for  225  hours  (1 86,000X). 


Table  2:  Layout  of  the  extrusion  processing  variables  showing  a  nested  factorial  design. 


EXTRUSION  TEMPERATURE 

320°C 

450°C 

5.3  dia  O 


590 


Table  3:  Treatment  number  designations  for  different  levels. 


Table  4:  Analysis  of  variance  table  for  fixed  statistical  design  model. 


591 


Table  5:  Analysis  of  variance  table  for  random  statistical  design  model. 


3 

3 

2 

3 

- 

Source 

df 

R 

R 

R 

R 

EMS 

i 

j 

k 

1 

Gi 

1 

i 

3 

2 

1 

a2  +  6gg  +2a^  +  3g2  +3  g2g  +g„ 

Sj(i) 

4 

i 

1 

2 

1 

g2  +2cj  +  2  a2  +0^  +ajs 

Tk 

1 

2 

3 

1 

1 

g2  +6g2  +3gG7.  +GjS 

GT* 

1 

1 

3 

1 

1 

ge  +3aGr  +ok 

TSjk(i) 

4 

1 

1 

1 

1 

o2+g2s 

£  (ijk)l 

24 

1 

1 

1 

1 

°e2 

Table  6:  Layout  showing  experimental  yield  strength  tensile  data  for  the  Al-2.6wt.%Li-0.09wt.%Zr 
demonstration  alloy  in  the  peak-aged  heat  treated  temper,  1 85°C  @  48  hours;  (all  tensile  data  given  in  ksi). 


EXTRUSION  TEMPERATURE  1 

320°C 

450°C  | 

5.3  dia.  0 

66.9,  68.2 

62.7, 64.6  j 

? 

0 

' — ' 

Q 

Z 

3 

O 

2.69  dia.  0 

67.1,  67.0 

68.4, 67.8  | 

>- 
Q i, 

f- 

w 

s 

1.91  dia.  0 

69.5,  69.2 

68.2,  67.8 

w 

0 

z 

0 

at. 

2.54  X  8.9  □ 

46.0, 45.1 

D 

H 

X 

w 

< 

►J 

§ 

0.635  X  8.9  □ 

46.8,  47.3 

f- 

O 

s 

0.318  X  8.9  □ 

38.9,  40.1 

39.1,39.2 

592 


593 


Control  of  Liquid  Segregation  of  Semi-solid  Aluminum  Alloys 
during  Intelligent  Compression  Test 

C.G.  Kang,  K.D.  Jung,  H.K.  Jung 

Pusan  National  University,  School  of  Mechanical  Engineering 
Pusan,  609-735,  Korea 

ABSTRACT 

Thixoforming  is  now  becoming  increasingly  popular  for  manufacturing  near-net  shape  parts  and  has  so 
many  advantages  compared  with  conventional  forming  technologies  that  it  has  spread  and  been  studied 
worldwide.  Components  produced  by  semi-solid  forming  process  have  good  mechanical  properties  and  less 
defects  such  as  porosity.  A  relationship  between  stress  and  stain  is  very  important  to  design  a  die  to  avoid 
defects  of  products  during  semi-solid  forming  process. 

Since  the  liquid  will  be  of  eutectic  composition  in  alloys,  liquid  segregation  will  result  in  significant  or 
undesirable  situations.  The  materials  used  in  this  experiment  are  A357,  A390,  A12024  alloys  that  have  been 
fabricated  by  the  electro-magnetic  stirring  process  from  Pechiney  in  France.  In  the  compression  test  with 
the  semi-solid  state  materials  the  liquid  phase  coexists  with  globularized  solid  particles.  Billet  temperature 
corresponding  to  the  desired  solid  fraction  is  controlled  by  an  induction  heating  system  which  was 
constructed  with  designed  coil  dimensions.  The  intelligent  compression  test  was  performed  by  induction 
heating  and  MTS.  During  compression,  specimen  temperature  were  continuously  measured. 

In  order  to  prevent  liquid  segregation,  the  measured  temperature  is  useful  to  control  strain  rate  during 
compression.  The  liquid  segregation  is  controlled  by  changes  in  the  strain  rate  and  solid  fraction  during  the 
compression  process.  The  characteristics  of  flow  between  solid  and  liquid  phase  considering  liquid 
segregation  is  examined  through  the  above  experiments.  Generally,  it  is  known  that  if  the  applied  stress  to 
semi-solid  alloys  is  below  the  yield  stress,  the  alloys  show  solid-like  behavior  but,  if  the  applied  stress 
exceeds  the  yield  stress,  the  semi-solid  alloys  flow  like  liquid.  Therefore,  it  is  important  to  establish  the 
yield  point  to  predict  the  rheological  behavior  of  semi-solid  alloys. 

INTRODUCTION 

The  technology  of  forming  metals  to  near-net  shaped  products  in  partially  remelted  state  satisfies  the 
growing  demand  for  high-strength  and  lightweight  aluminum  components  in  the  field  of  automobile  and  is 
gaining  attention  in  the  broad  field  of  engineering.  For  optimization  of  net  shape  forging  process  with  semi¬ 
solid  materials(SSM),  it  is  important  to  predict  the  deformation  behavior  for  variation  of  strain  rate,  but  the 
rheological  behavior  of  mushy  state  alloys  is  not  sufficiently  known.  Usually,  the  rheological  behaviour  of 
alloys  in  the  semi-solid  state  has  been  examined  by  using  parallel  plate  compression.  However,  for  analysis 
of  the  thixoforming  process,  it  should  be  necessary  to  conduct  a  formation  of  stress-strain  curve  in  semi¬ 
solid  alloys.  In  particular,  important  problem  is  to  prevent  segregation  of  liquid  component  during 
deformation.  When  semi-solid  aluminum  alloys  are  compressed,  or  liquid  component  subjected  to 
compressive  forces  may  be  ejected  to  the  outside  of  the  billet  surface. 

The  parts  of  complex  shapes  are  fabricated  by  casting  and  forging.  Casting  products  are  limited  in  their 
mechanical  properties  due  to  microstructural  defects  such  as  porosity.  However,  near-net  shape  forming 
holds  some  promise  to  improve  mechanical  properties  by  removing  micro  defects  in  castings  by  hot  forging 
and  hot  extrusion  from  the  solid  state.  However  on  the  other  hand,  the  forming  of  a  precision  product  is 
limited  due  to  the  higher  forming  load  and  so  there  is  a  loss  in  productivity  and  economic  efficiency  due  to 
post-processing  requirements  such  as  machining.  To  overcome  these  problems,  a  semi-solid  forming 
process  for  near-net  shaped  parts  from  raw  materials  with  semi-solid  materials(SSM)  is  widely  studied. 

Semi-solid  forming  is  a  method  to  make  globular  structures,  and  to  deform  the  near-desired  product  at 
temperatures  between  the  solidus  and  liquidus.  A  lot  of  concern  is  focused  on  several  aspects  like  saving 
energy  and  process  efficiency,  etc.[l-2j.  Since  manufacturing  technology  for  semi-solid  alloys  was 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


594 


discovered  during  studies  on  hot-tearing  phenomena  in  the  early  1 970's,  Suery  et  al.[4]  have  studied 
compression  behavior  of  Sn-15%Pb  alloy  in  semi-solid  state  and  the  effect  of  strain  rate,  and  Toyosima  et 
al.[5]  have  studied  simple  compression,  fdtering,  and  rolling  processes.  They  formulated  their  models  by 
applying  compressive  visco-plastic  models  for  the  solid  region  of  the  materials  and  assuming  Darcy's  flow 
for  the  liquid  region.  Kang  et  al.[6]  compared  experimental  data  with  results  of  finite-element  analysis  for 
compressing  semi-solid  aluminum(A356)  materials  using  a  yield  condition  for  porous  material.  In  the  case 
of  SSM  forging,  Kenny  et  al.[7]  reported  the  best  quality  production  was  obtained  at  a  solid  fraction  in  the 
billet  of  60-70  %.  Yoshida  et  al.[8]  reported  microstructure  and  liquid  flow  state  with  Al-Cu  alloy  in  semi¬ 
solid  forging.  As  a  result  of  casting  with  rapid  velocity,  using  materials  with  solid  fraction  of  60%,  it  was 
observed  that  there  is  no  segregation  in  the  produced  specimen.  Kang  et  al.[9]  showed  the  effect  of  gate 
shape  and  forging  temperature  on  mechanical  properties  in  the  injection  forging  process  of  semi-solid 
aluminum  material.  To  prevent  segregation  of  liquid  in  forging,  the  die  should  be  preheated.  It  is  reported 
that  the  lowest  temperature  in  die  reheating,  depends  on  billet  temperature  and  compressing  die  shape,  and 
it  was  reported  that  the  optimal  die  temperature  is  250  to  300  °C.  Chen[10]  reported  experimental 
deformation  behavior,  dividing  it  into  deformation  behavior  of  the  liquid  region  according  to  mutual 
contact  of  the  liquid  region  and  solid  grains,  plastic  deformation  of  the  solid  grains,  and  contact  of  the  solid 
grains,  in  studying  the  deformation  behavior  of  SSM. 

The  curve  of  stress-strain  rate  of  semi-solid  material  is  different  from  hot  compression  phenomenon  of 
established  materials  for  deformation  of  solid  grains  in  compression  forming.  It  is  very  important  to  note 
that  the  limiting  strain  rate  does  not  increase,  but  decreases,  in  accordance  with  the  increase  in  stress  in 
compression  experiments  of  the  semi-solid  forging  process. 

Therefore,  in  this  study,  compression  experiments  have  been  performed  to  investigate  deformation  behavior 
of  semi-solid  material  with  variation  in  processing  parameters  such  as  compression  velocity  and  the  solid 
fraction.  In  order  to  produce  components  without  defects,  the  forging  condition  is  controlled  to  increase 
stress  with  increase  in  strain  rate  in  initial  forming  at  constant  velocity.  Therefore  the  relationship  of  the 
velocity  variation  to  the  continuous  increase  of  stress  with  increase  of  strain  rate  has  been  designed  into  the 
compression  experiments. 

COMPRESSION  EXPERIMENTS 

The  material  used  in  these  experiments  are  ALTHIX(A357,  A390)  material,  which  is  fabricated  by  electro¬ 
magnetic  stirring,  from  Pechiney  and  is  A12024,  which  is  fabricated  by  hot  extrusion  at  a  compression  ratio 
9.37.  The  chemical  composition  of  each  material  is  shown  Table  1. 


Table  1  Chemical  composition  of  A357(ALTHIX),  A12024  and  A390(ALTHX) 


Si 

Fe 

Cu 

Mn 

Mg 

Cr 

Zn 

Ti 

Pb 

A357 

Min(%) 

6.5 

- 

- 

- 

0.30 

- 

- 

- 

- 

Max(%) 

7.5 

0.15 

0.03 

0.03 

0.40 

- 

0.05 

0.20 

0.03 

A12024 

Min(%) 

- 

- 

3.8 

0.3 

1.2 

- 

- 

- 

- 

Max(%) 

0.50 

0.70 

4.9 

0.9 

1.8 

0.10 

0.25 

0.15 

- 

Min(%) 

16.0 

_ 

4.0 

_ 

0.5 

_ 

_ 

. 

. 

A3  90 

17.0 

0.4 

5.0 

0.1 

0.65 

- 

0.05 

0.2 

0.03 

Max(%) 

1  forming 

semi-solid 

material, 

compression 

experiments 

have 

been  performed 

to  investigate  the 

relationship  of  flow  characteristics  with  variations  in  solid  fraction  and  die  velocity.  The  compression  test 
of  semi-solid  material  is  performed  with  specimens  of  9  x  h  =  15  x  20  mm  at  a  temperature  of  1200  °C 
with  a  desired  solid  fraction  using  an  MTS  (Material  Test  System)  with  an  associated  electronic  furnace. 
Fig.  1  is  the  schematic  diagram  of  the  experimental  apparatus  used  for  compression  of  the  semi-solid 
aluminum  alloy.  Temperature,  solid  fraction,  load,  displacement  and  compression  velocity  are  measured 
during  an  SSM  compression  test.  At  first,  the  stress-strain  are  investigated  from  load  and  displacement. 
Generally,  in  SSM  compression  tests,  the  stress  decreases  from  maximum  strain. 


595 


Fig.  1.  Schematic  diagram  of  the  intelligent  compression  test  apparatus 

designed  to  prevent  liquid  segregation  during  an  SSM  experiment. 

Then  the  velocity  changes  at  the  displacement  correspondent  to  maximum  stress,  in  order  to  prevent  of 
liquid  segregation.  In  this  obtained  stress-strain  curve,  stress  increases  according  to  strain  increase. 
Compression  experiments  have  been  performed  using  A357,  A390,  A12024  with  compression  experiment 
method  of  A356.  For  experiment  conditions,  solid  fractions  are  50%(620°C),  70%(599°C)  and 
90%(556°C),  and  die  velocities  are  500,  200,  100, 10,  and  1mm  s'1  in  the  case  of  A12024 


EXPERIMENTAL  RESULTS  AND  DISCUSSION 

Compression  experiments  were  performed  using  A357,  A390  and  A12024.  Fig.2(a)-(e)  shows  the  shape  of 
the  specimen  at  the  state  of  50%(620°C)  solid  fraction  according  to  change  in  Vd.  We  know  the  surface  of 
specimen  proceeds  to  fracture  with  increase  in  die  velocity.  Fig.  3(a)-(e)  shows  the  shape  of  the  specimen 
at  the  state  of  70%(599°C)  solid  fraction  according  to  the  change  in  die  velocity. 


Fig.2.  Compression  at  fs  =  50%  (A12024) 
where  Vd  (mm/s)  is 
a.  500  b.  200  c.  100  d.  10  e.  1 


Fig.3.  Compression  at  fs  =  70%  (A12024) 
where  Vd  (mm/s)  is 
a.  500  b.  200  c.  100  d.  10  e.  1 


There  is  only  a  small  difference  in  the  deformation  compared  to  50%,  but  the  degree  of  surface  fracture  is 
slighty  lower.  Fig.  4  (a),(b)  shows  the  shape  of  the  specimen  at  the  state  of  90%(556°C)  solid  fraction.  The 
barrel  was  observed  to  be  similar  to  that  from  hot  compression  with  increase  in  solid  fraction. 


(a)  (b)  .  Fig.4.  Compression  at  fs  =  90%  (A12024) 

where  Vd  (mm/s)  is:  a.  10  b.  1 


596 


Figures  5  to7  show  curves  of  true  stress-strain  rate  according  to  change  in  die  velocity  when  the  solid 
fractions  are  fs  =  50%,  70%,  and  90%.  As  shown  in  Fig.  5  and  Fig.  6,  the  initial  stress  peak  point  is 
observed  at  strain  rates  of  0.05  to  0.1.  However,  from  strain  e  =  0.1,  the  stress  decreases  remarkably  and 
reaches  a  plateau.  This  phenomenon  is  accounted  for  by  liquid  flow  being  activated  and  transferring  to  the 
free  surface  area  at  the  specific  strain  rate,  even  though  the  stress  increases  for  densification  of  the  structure 
and  stimulation  of  liquid  flow  from  the  very  start  of  the  test. 


Fig.5.  Engineering  stress-true  strain  curve  Fig.6.  Engineering  stress-true  strain  curve 

at  fs  =  50%  (A12024)  at  fs  =  70% 

(Vd  =  500  mm/sec  to  1  mm/sec)  (Vd  =  500  mm/sec  to  1  mm/sec) 


In  compression  forming  of  semi-solid  materials  at  high  temperature,  the  surface  of  the  specimen  brakes 
away  during  compression  by  liquid  flow  towards  the  surface  of  the  specimen.  Therefore,  a  forming  method 
for  closed  forging  shapes  is  needed,  since  free  surface  does  not  exist.  Fig.  7  shows  the  shape  of  the  curve 
the  state  of  f$  =  90%  solid  fraction  is  different  from  that  at  50  and  70  (see  Figures  5  and  6).  This  is  because 
at  high  temperature,  globularity  doesn’t  occurred  in  the  solid  fraction,  in  the  direction  of  extrusion, 
orientation  of  the  structure  doesn't  exist  and  liquid  flow  does  not  occur  during  the  early  stages  of 
deformation,  the  stress  increases  quickly,  and  as  globular  microstructure  is  fractured,  stress  decreases. 


Tr»  Stalff.  e 


Fig.7.  Engineering  stress-true  strain  curve 
at  fs  =  90%  (A12024) 

(Vd=  10  mm/sec  to  1  mm/sec) 


tig  Ya 


Fig.8.  Relationship  between  logc  and  log  Y 
at  fs=50%,  fs=70%  (A12024) 


Fig.  8  shows  that  the  relationship  of  stress-strain  rate  represents  that  of  algebraic  coordinates(log  e-log  a) 
for  compression  experiments  at  solid  fractions  fs  of  70  and  50%,  employing  the  definition  of  the 
coefficients  K  and  m  (flow  stress  equation:  ct=K  Ym).  The  K  and  m  values  are  solved  with  linear  regression 
to  give  the  following: 


•  Solid  fraction  fs  =  50%:  c=K  Ym :  K  =  0.49,  m  =  0.21  (A12024) 

•  Solid  fraction  fs  =  70%:  o=K  Ym  :  K=  1 .34,  m  =  0.48  (A12024) 


Fig.9.Evolution  of  the  microstructure  of  semi-solid  alloy 
as  a  function  of  variation  in  die  speed  (fs  =  50%,  620°C) 
a.  center  b.  surface 

Fig.  9.  a.  and  b.  show  structural  photographs  of  specimen  centers  and  surfaces  compressed  at  different  die 
velocity  in  the  case  of  fs  =  50%.When  the  die  velocity  is  over  Vd=100mm  s'1,  solid  grains  and  liquid  phase 
flow  simultaneously,  so  solid  grains  are  relatively  homogeneous  over  the  entire  cross  section.  The 
compression  deformation  is  observed  in  the  middle  part  of  the  material  by  the  sticking  of  solid  grains  and 
the  relatively  minute  structure  size. 

Fig.l0.a.~c.  shows  the  flow  state  of  specimen  cross-sections  at  a  height-reduction  ratio  of  50%  for  die 
velocities,  Vd  of  500mm  s'1,  100mm  s'1,  1mm  s'1.  When  die  velocity,  Vd  is  500mm  s'1  (Fig.lO.a.),  the 
separation  phenomenon  of  solid  and  liquid  phase  is  not  observed,  but  when  Vd  =  1mm  s  1  (Fig.lO.c.),  the 
separation  phenomenon  of  solid  and  liquid  phase  is  clear,  so  after  deformation,  the  boundary  includes 
significant  porosity  and  liquid  is  distributed  much  around  the  surface. 


a.  Vd=500mm/sec  b.  Vd=100mm/sec  c.  Vd-lmm/sec 


Fig.10.  Microphotograph  of  cross-section  compressed  at  fs  =  50%  .reduction  ratio  -  50% 

Fig.l  1  shows  the  relationship  between  true  strain  and  stress  for  variation  in  strain  rate  with  A357  at  a  solid 
fraction  of  50%.  With  a  high  strain  rate  of  0.588  sec1,  the  true  strain  to  obtain  maximum  stress  is  0.1. 
However,  with  strain  rates  below  29.4  sec'1,  the  maximum  stress  occurs  at  a  true  strain  of  0.15.  The  stress 
decreases  remarkably  from  the  maximum  stress  when  the  strain  rate  increases,  as  shown  in  Figures  5  and  6. 

Figl2  shows  the  strain-stress  curve  for  A390  alloy.  Except  for  a  strain  rate  of  0.588  sec"1,  the  maximum 
stress  is  obtained  at  a  true  strain  of  0.15.  In  the  forming  process  for  SSM,  the  strain  rate  should  be 
controlled  to  increase  the  stress  continuously  according  to  the  increase  in  strain  rate  at  the  position  shown  at 
the  peak  point  of  stress  in  Figures  5,  6,  1 1  and  12. 


598 


Fig.  11.  Engineering  stress-true  strain  curve  (A3  57) 
for  variation  of  strain  rate  at  fs  =  50% 


Fig.  12.  Engineering  stress-true  strain  curve  (A390) 
for  variation  of  strain  rate  at  fs  =  50% 


To  understand  this  phenomenon,  the  continuous  increase  in  stress  according  to  an  increase  in  strain  rate  is 
investigated  in  another  compression  test.  Fig.  13  is  the  curve  of  stress-strain  rate  with  various  compression 
velocities  at  a  strain,  G  =0.1.  As  can  be  seen,  the  stress  increases  continually  for  strain  rates  of  0.588, 
2.353,  7.058,  and  29.4  sec'1.  Fig.  13  is  the  curve  of  stress-strain  rate  with  various  compression  velocities  at  a 
strain  G=0.1.  As  shown  in  Figl3,  the  stress  increases  continuously  for  compression  test,  when  strain  rate  of 
,  5.88  xlO  '2 ,  5.88  xio  2.353  and  7.058  sec'1  are  to  increase  of  5.88  xio  2.353,  7.085  and  29.4  sec'1 
respectively.  When  compression  velocity  changes,  discontinuity  point  is  found  at  strain,  0.17.  The  sharply 
decrease  of  stress  is  considered  as  an  error  of  time  needed  to  control  velocity  with  high  die  speed  change  in 
Material  Testing  System.  Fig.  14  shows  jumping  strain  rate  values  to  obtain  continuously  increasing  strain- 
stress  curve.  When  initial  strain  is  5.88x10  ‘2  sec  ,  and  first  and  second  jump  strain  rate  is  5.88x10  and 
5.88  sec  1  respectively.  As  shown  in  Fig.  1 3,  the  data  error  at  the  jumping  point  is  also  considered  because  of 
sensitivity  of  MTS  due  to  velocity  change. 


Fig.  13.  Engineering  stress-strain  curve  obtained  by 
varying  the  initial  compression  velocity  to 
prevent  liquid  segregation. 


0  0 .1  0 .2  0 .3  0 .4  0 .5 

true  s trail,  ¥a 

Fig.  14.  Engineering  stress-strain  curve  obtained  by 
varying  the  initial  compression  velocity  at 
0.588  s'1  and  0.0588  s'1  to  prevent  liquid 
segregation. 


According  to  the  above  experiment,  forming  limitations  are  improved  because  the  lower  the  solid  fraction, 
the  lower  the  load.  It  is  known  that  distribution  of  the  solid  and  liquid  phase  is  homogeneous  because  of  the 
no  distinguished  boundary  at  Vd  =100  mm  s'1  as  shown  in  Fig.  10.a.~c. 

CONCLUSIONS 

In  the  compression  test  of  semi-solid  aluminum  materials,  the  following  results  were  obtained  from  an 
investigation  of  the  deformation  and  transformation  of  the  macrostructure,  with  respect  to  the  strain  rate: 

1 .  From  the  macrostructure  change  appearing  in  the  compression  behavior  of  SSM,  a  more  homogeneous 
structure  can  be  obtained  with  greater  compressive  velocity. 


599 


2.  In  the  compression  tests  of  SSM,  macro-separation  appears  between  the  solid  and  the  liquid  regions 
because  of  outflow  of  the  liquid  state.  Densification  of  the  structure  was  observed  in  the  center  of  the 
column  ,  and  reduced  porosity  at  the  surface  was  observed,  with  greater  compressive  velocity. 

3.  After  compression  forming  of  A12024  alloy,  the  faster  the  deformation  rate,  the  better  the  distribution  of 
the  solid  and  liquid  phase.  The  critical  rate  to  distribute  solid  and  liquid  phases  homogeneously  is  about 
Vd  =  100  mm  s'1. 

4.  From  compression  experiments  using  A357,  A390  and  A12024,  a  database  of  semi-solid  materials  can 
be  obtained  to  prevent  liquid  segregation,  in  the  compression  forming  process,  by  changing  the  strain 
rate.  This  technique  is  suggested  as  an  intelligent  way  to  conduct  compression  experiments  by  control  of 
the  solid  fraction  and  liquid  phase  flow  conditions. 


ACKNOWLEDGEMENT 

This  work  has  been  supported  by  the  Engineering  Research  Center  for  Net  Shape  and  Die  Manufacturing 
(ERC/NSDM)  of  Pusan  National  University  which  is  financed  jointly  by  Korean  Science  and  Engineering 
Foundation(KOSEF) 


REFERENCES 

1.  C.G.  Kang,  J.S.  Choi  ,D.W.  Kang,  1998.  A  filling  analysis  of  the  forging  process  of  semi-solid  aluminum 
materials  considering  solidification  phenomena.  J.  Materials  Processing  Technology,  73,  289-302. 

2.  M.C.  Flemings,  1991.  Behavior  of  Metal  Alloys  in  the  Semi-Solid  State.  Metallurgical  Transactions , 
27A,  957-981. 

3.  D.B.  Spencer,  R.  Merabian  and  M.C.  Flemings,  1972.  "Rheology  of  Semi-Solid  Dendritic  Sn-Pb  Alloys 

at  Low  Strain  Rates",  Metall.  Trans.,  3,  1925-1932. 

4.  M.  Suery  and  M.C.  Flemings,  1982.  Effect  of  Strain  Rate  on  Deformation  Behavior  of  Semi-Solid 
Dendritic  Alloys,  Metall.  Trans.,  13(A),  1809-1 81 9. 

5.  S.  Toyoshima,  1994.  A  FEM  Simulation  of  Densification  in  Forming  Processes  for  Semi-Solid 
Materials.  Proceedings  of  the  3rd  Int'l  Conf.  on  Processing  of  Semi-Solid  Alloys  and  Composites, 
University  of  Tokyo,  47-62. 

6.  C.G.  Kang,  B.S.  Kang,  and  J.L  Kim,  1998.  An  investigation  of  the  mushy  state  forging  process  by  the 

finite  element  method.  J.  Materials  Processing  Technology, 80(8 1  ),■ 444-449 

7.  M.P.  Kenny,  J.A.  Courtois,  R.D.  Evans,  G.M.Farrior,  C.P.  Kyonka,  A.A.  Couch,  K.P.  Young  Semi-Solid 

Metal  Casting  and  Forming,  Metals  Handbook,  9th  Ed.,  15,  327-338. 

8.  C.  Yoshida,  M.  Moritaka,  S.  Shinya,  S.  Yahata,  K.  Takebayashi,  A.  Nanba,1992.  "Semi-Solid  Forging 
Aluminium  Alloy",  2nd  Int'l  Conf.  on  Processing  of  Semi-Solid  Alloys  and  Composites,  MIT,  95-102. 

9.  C.G.  Kang,  J.S,  Choi,  1998.  Effect  of  gate  shape  and  forging  temperature  on  the  mechanical  properties  in 
the  injection  forging  process  of  semi-solid  aluminum  material.  J.  Materials  Process.  Tech. ,13,  25 1  -63 

10.  C.P.  Chen,  X-Ya  Tsao,  1996.  Semi-solid  deformation  of  A356  A1  alloys.  Proceedings  of  the  4th 
International  Conference  on  Semi-solid  Processing  of  Alloys  and  Composites,  1 6-20. 


600 


601 


Adaptability  to  Frictional  Change  of  Fuzzy  Adaptive  Blank  Holder 

Control  for  Deep  Drawing 

S.  Yoshihara*,  K.  Manabe**  and  H.Nishimura** 


*  Tokyo  National  College  of  Technology,  Dept,  of  Mech.  Eng. ,  1220-2  Kunugida-machi, 

Hachioji-shi,  Tokyo  193-86 10,  Japan 

**Tokyo  Metropolitan  University,  Dept,  of  Mech.  Eng.,  1-1  Minami-ohsawa, 
Hachioji-shi,  Tokyo  192-0397,  Japan 


ABSTRACT 

Validity  of  the  fuz2y  adaptive  variable  blank  holder  force  (BHF)  control  deep  drawing  system  to  frictional 
change  in  the  process  has  been  studied  for  steel  sheet.  The  circular-cup  deep-drawing  experiment  has  been 
carried  out  using  steel  sheet  (SPCD)  with  high  anisotropy  (r=l  .57).  To  change  the  lubrication  condition,  the 
partial  lubrication  method  was  adopted  using  fluorine  lubricant.  The  friction  coefficient  was  evaluated  by 
the  plastic  deformation  model.  It  is  confirmed  for  steel  sheet  that  the  BHF  was  properly  controlled 
corresponding  to  friction  change  at  the  flange  part. 


INTRODUCTION 

Friction  between  blank  and  tools  in  sheet  metal  forming  is  a  very  important  process  variable  governing 
formability.  Many  studies  on  friction  coefficient  and  lubrication  conditions  have  been  conducted.  For 
example,  the  friction  coefficient  in  deep  drawing  was  determined  from  continuum  plastic  mechanics,  and 
the  effects  of  the  material  surface  characteristics  and  the  lubrication  conditions  were  studied  [1].  Friction 
coefficient  between  punch  and  blank  was  calculated  from  plastic  deformation  model  by  using  the 
experimental  results[2].  The  results  showed  that  the  lubrication  condition  varied  during  the  process.  From 
the  results,  the  process  characteristics  are  also  considered  to  vary  due  to  variationin  the  friction  coefficient. 

The  BHF  as  well  as  friction  is  a  very  important  process  variable  to  improve  the  product  quality.  In  recent 
years,  the  variable  BHF  technique  for  suppressing  the  flange  wrinkles  was  studied  for  circular-cup  deep- 
drawing  [3], [4],  Besides,  the  variable  BHF  method  based  on  the  fracture  limit  and  a  combined  method 
with  wrinkle  and  fracture  limit  curves  were  developed  [5],  For  a  rectangular  panel  stamping,  the  effects  of 
variable  BHF  were  studied.  However,  variable  BHF  control  system  corresponding  to  frictional  change 
during  the  process  has  not  been  studied  in  spite  of  affecting  the  drawing  limit  and  the  product  quality. 

In  the  previous  study  [6],  we  developed  a  fuzzy  adaptive  BHF  control  system  in  order  to  optimize  circular- 
cup  deep-drawing  process  and  to  improve  the  process  flexibility  coping  with  frictional  change.  Two 
evaluation  functions  and  a  constraint  function  for  deciding  the  progress  of  deep-drawing  process  are  used  in 
the  system.  The  BHF  was  calculated  and  inferred  from  the  membership  functions  and  if-then  rule. 
Meanwhile,  the  friction  coefficient  between  blank  and  tools  was  evaluated  from  the  experimental  results  of 
aluminum  alloy  sheet  by  a  newly  proposed  formulation  based  on  the  plastic  deformation  model.  In  the 
formulation,  the  following  four  factors  were  considered;  a  die  contact  angle  with  the  thickness,  the  lifting- 
up  behavior,  the  stretching  at  punch  shoulder  part  and  the  change  of  contact  boundary  at  the  die  hole  with 
the  progress  of  process.  The  lubrication  types  of  the  blank  tested  were  fluorine  coating,  acetone  degreasing 
and  a  combination  of  them  for  varying  the  friction  state  during  the  process.  The  calculated  results  of  the 
friction  coefficient  agree  well  with  the  empirical  results  under  the  variable  BHF  condition  as  well  as  under 
the  constant  BHF  condition.  The  results  of  fuzzy  adaptive  BHF  control  show  that  the  BHF  is  adaptively 
controlled  corresponding  to  the  variation  in  friction  for  aluminum  alloy  sheet  (A5 1 82-0).  By  the  way, 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


602 


a  =  0  I  With  cone  angle  a 

Fig.l  Geometrical  shape  of  deformed  blank 

aluminum  alloy  is  a  typical  face  centered  cubic  lattice  (fee) 
metal  and  has  low  plastic  anisotropy  (r-value<l).  The 
alternative  crystal  structure  is  body  centered  cubic  lattice 
(bee)  and  the  typical  metal  is  steel  with  high  plastic 
anisotropy  (r- value  >1).  The  objective  of  this  study  is  to 
confirm  that  the  fuzzy  adaptive  BHF  control  system  achieves 
high  reliability  and  adaptability  to  any  frictional  change  for 
steel  sheet  with  high  r-value. 

THEORY 

In  the  circular-cup  deep-drawing,  taking  account  of  the 
friction  resistance  at  the  flange  caused  by  the  BHF  and  the 
bending/unbending  at  the  die  shoulder,  the  drawing 
stress(radial  stress)  on  an  element  at  the  die  throat  is  given  by, 


Fig.2  Schematic  of  blank  at  die  shoulder  part 


Notation 

ADR*  :  blank  reduction  ratio 
DR :  drawing  ratio 

R0  :  initial  radius  of  circular  sheet  blank 

r0  :  current  radius  of  circular  sheet  blank 

rp  :  punch  shoulder  radius 

Rp  :  punch  radius 

rd :  die  shoulder  radius 

Rd  :  die  hole  radius 

?o  :  initial  blank  thickness 

t :  blank  holder  displacement 

S :  punch  stroke 

a  :  cone  angle  body  part 

ocq  :  mean  equivalent  stress 

Eeq  :  mean  equivalent  strain 

|i :  friction  coefficient 


o  =e<H,Bi±^oe?ln^  +  i^-  +  ^  1. 

| «  1  +  2r  R0  nr0t  4 rd  j  4 rd 
The  constitutive  equation  of  the  blank  material  is, 

rr  eq  =  CE  eq  2. 

The  mean  equivalent  strain  in  the  flange  can  be  written  using  Mises-Hill 's  associated  equivalent  strain  and 
mean  radius  of  the  whole  flange. 

f  r  .  „  -,21 


v-*2+ 


’  V  2(1  +  2r) 

Punch  load  by  using  drawing  stress  is  expressed  by, 


ro 

c  1 

Ro 

nr0t 

4rdj 

P  =  2nRdts\na 


The  above  equation  is  re-expressed  by  Taylor  expansion, 

P  =  2jt/?rf?sina  (1  +  pa){ J— ■■ +  ^  a eq  In  —  +  \  +  - 


1  +  2  r  Rp  7tr0f  4  rd  4  rd 


603 


//al 


nr0t  j 


.  „*+  £lL+a( 
W 


«o  Hrd 

Therefore,  the  friction  coefficient  can  be  written  by, 
-E  +  Je2  -4DF 


1  +  2  r  Rn  4  rd  \  V  1  +  2r  R0  2  rd 


0  eqt 

P 

2  rd 

2nRdtsma 

=  0 


6. 


7. 


2Z) 


Z)g2  +  £g  +  F  =  0 


8. 


where 
D  = 


MU  9.,.  +  9.b.  F-i 

nrat  K r0t  V  1  +  2r  /f0  4rrf  V 


2('  +  r)  o  In  —  +  — ^ 


1  +  2  r 


2r, 


9.c. 


2nRdtsmo, 

In  Fig.l,  the  die  contact  angle  a!  can  be  expressed  from  the  geometrical  relationship  between  the  die 
contact  angle  and  punch  stroke  as, 


,  —  A2  +  J  A-,'  —  4AtA3 
a,  =  Cos”  —  y 


2  A, 


10. 


Ai  cos2^  +  A2  cosa,  +  T,  =  0 


11. 


where 


12>  ,  „ 


2(rp+rrf+t0)(S-r p-rd-t0)  .  ,  (r„  +  + 10)2 


(Cpj+rp+rj)2 


(Cpd+rp+rd)2 


-  +  1  12.b.  T,=-^ 


(cpj  +rP  +rd ) 


--1 


12. c. 


Next,  when  the  BHF  is  low,  the  die  uncontact  angle  takes  place  as  shown  in  Fig.2  as  the  blank  is  lifted  up 
by  bending  at  the  die  shoulder  part  in  a  straight  shape.  The  geometrical  relationship  between  blank  holder 
displacement  and  die  uncontact  angle  is, 

(T-t0)  +  rd(  l-cosa2) 


tana. 


12  rd  sin  a  2  +  Vo  -(Rd+rd)} 
In  Fig.2,  the  die  uncontact  angle  a2  can  be  expressed  as 
~A2+  ^/t22  -4 A,A3 

cosa.  - - - -  14. 

2  At 

where 


13. 


5,  cos2  a2  +  B2  cosa2  +  53  =  0 


15. 


(T-t0-rd)2 


B  v  •„  •«>/  +]  16-a.  B  - 


2 (T-t0  -rd)rd 


£>.  =  — - 'W  r  1  AV/.C4.  Uf 2  —  -f  .19 

Vo  ~(Rd+rd )}  Vo  -  (Rd  +  rd  )} 

The  actual  contact  angle  a  is  expressed  by , 


16.b. 


B3  = 


Vo  -  (Rd  +  rd  )}' 


-1 


16.c. 


a  =a,  -a. 


17. 


BHF  ADAPTIVE  CONTROL 

Figure  3  shows  the  block  diagram  of  the  fuzzy  adaptive  BHF  control  process.  The  computer  can  capture 
online  the  following  five  sensing  data:  punch  load,  punch  stroke,  maximum  apparent  thickness(blank 
holder  displacement),  punch  speed  and  blank  reduction  ratio.  The  fuzzy  rule  is  constructed  by  two 
evaluation  functions  (punch  stroke  curve  and  maximum  apparent  thickness  curve)  and  one  constraint 
function. 

Punch  stroke  curve 

An  ideal  punch  stroke  curve  can  be  obtained  geometrically  by  assuming  constancy  of  the  wall  thickness. 
The  evaluation  function  (j)  is  the  difference  between  actual  punch  stroke  curve  and  the  ideal  curve,  another 
evaluation  function  <j> '  is  the  differential  coefficient  of  <j>.  An  evaluation  function  cos  is  estimated  from  the 
combination  of  <t>  and  <b  '.  When  to  s  increases,  the  BHF  has  to  decrease  to  minimize  the  evaluation  function. 

Maximum  apparent  blank  thickness  curve 

Blank  holder  displacement  which  corresponds  to  blank  thickness  at  the  flange  edge  (maximum  apparent 
blank  thickness)  was  measured  instead  of  the  wall  thickness  distribution.  The  difference  between 
maximum  apparent  blank  and  initial  blank  thickness  is  used  as  the  evaluation  function  (p,  whereas  another 
evaluation  function  cp',  is  the  differential  coefficient  of  <p  regarding  to  the  flange  wrinkling.  Another 
evaluation  function  co  t  is  employed  from  the  combination  of  (p  and  tp’.  When  the  to ,  increases,  the  BHF  has 


604 


to  increase  to  minimize  the  evaluation  function. 


Punch  stroke 


Fig.3  Fuzzy  adaptive  control  block  diagram  for  optimal  BHF  control  system 


X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 


is 

is 

is 

is 

is 

is 

is 

is 

is 

is 

is 

is 

is 

is 

is 

is 


Table  1  If-then  rule  for  deciding  the  ABHF 


IF  (x,  cos,  tnt) 


THEN  (ABHFhi|) 


PB  and  ®s  is 
PB  and  cos  is 
PB  and  cos  is 
PB  and  cos  is 
ZO  and  ©s  is 
ZO  and  cos  is 
ZO  and  ©s  is 
ZO  and  ©s  is 
NS  and  ©s  is 
NS  and  a>s  is 
NS  and  ©s  is 
NS  and  cos  is 
NB  and  <us  is 
NB  and  ©s  is 
NB  and  a>s  is 
NB  and  ©s  is 


LA  and  ©t  is  LA 
LA  and  cot  is  SM 
SM  and  to(  is  LA 
SM  and  ©t  is  SM 
LA  and  cot  is  LA 
LA  and  ©t  is  SM 
SM  and  w(  is  LA 
SM  and  cot  is  SM 
LA  and  ff>t  is  LA 
LA  and  cot  is  SM 
SM  and  co(  is  LA 
SM  and  co(  is  SM 
LA  and  0)(  is  LA 
LA  and  ©t  is  SM 
SM  and  cd(  is  LA 
SM  and  (ot  is  SM 


ABHFpLL=° 

ABHFPLS=1.50 

ABhfpsl=-0.25 

ABHFPSS=1.00 

ABHFZll=0 

ABHFzls=0.50 

ABHFzsl=-0.50 

ABHFz«c”0 

ABHFsll=0 

ABHFsls=3 

ABHFssi=-0-25 

ABHFSSS=2-00 

ABHFbll=0 

ABHFbls^.OO 

ABHFbsl=-0.25 

ABHFbss=3.00 


Table  3  Lubricating  patterns  of  blank 


1)  Degreasing  with 
acetonefAcetone 
degreasing) 

2)  Fluorine  coating 
(Fluorine) 

Degreasing  [ 

with 

|  Fluorine 

^T^^Die  hole 

X  j^^Die  hole 

3)  Inner  fluorine 

(Fluorine/Degreasing) 

4)  Outer  fluorine 

(Degreasing/Fluorine) 

Degreasing  ,  Fluorine 

with  acetane/^j~^coating 

Degreasing  \  Fluorine 

with  acetone^'^'N.  coating 

f^^Die  hole 

Ns'*i~’'',^Die  hole 

Table  2  Mechanical  properties  of  SPCD  sheet  used 


Yield 

Stress 

Gs  N/mm2 

Tensile 
Strength 
oB  N/mm2 

Breaking 

Elongation 

% 

* 

C  value 

N/mm2 

n  value 

r  value 

173 

311 

51.3 

522 

0.22 

1.57 

*  see  Eq.  (2) 


Table  4  Specification  of  punch  and 

die  used  in 

the  experiment 

Punch  shoulder  radius  rp  mm 

4 

Punch  diameter  £>D  mm 

33 

Die  shoulder  radius  rd  mm 

3 

Die  diameter  Dd  mm 

36.5 

Constraint  function 

Constraint  function  decides  the  progress  of  deep  drawing  process.  The  controlled  BHF  value  was  calculated 
and  inferred  by  the  fuzzy  logic  using  above  mentioned  two  evaluation  functions  and  the  constraint  function 
in  order  to  achieve  optimal  control  during  the  process  even  though  the  friction  changes.  The  controlled 
increment  of  BHF  is  calculated  by  the  algebraic  production-barycenter  method  from  a  combination  of  the 
membership  function  and  the  if-then  rule  as  shown  in  Table  1. 


605 


Fig.4  Photograph  of  deep  drawing  system 


0.4 

0.3 

st  0.2 
0.1 


0 


0  0.1  0.2  0.3  0.4  0.5  0.6  0.7 

ADR* 

Fig.5  Effect  of  lubrication  conditions  of  friction 
coefficient  p-ADR*  relation  (Constant 
BHF  test) 


EXPERIMENT 

Steel  sheet  for  deep  drawing  (SPCD)  of  1 .0  mm  thickness  was  employed  in  the  experiment.  The  material 
properties  of  the  blank  are  listed  in  Table  2.  The  drawing  ratio  of  the  blank  used  is  1.98.  The  lubricating 
patterns  used  are  dry  fluorocarbon  (spray  type)  and  degreasing  with  acetone  as  shown  in  Table  3.  The  deep 
drawing  apparatus  is  capable  of  the  computerized  control  of  BHF  and  punch  speed  in  the  deep  drawing 
process,  which  was  already  developed  by  the  authors  [6],  This  system  is  equipped  with  several  sensors; 
punch  displacement,  punch  load,  blank  holding  force,  blank  holder  displacement  and  radial  drawing 
displacement  of  blank  flange.  Figure  4  shows  a  photograph  of  the  deep  drawing  control  system.  The 
specifications  of  die  and  punch  used  are  shown  in  Table  4.  All  the  experiments  were  conducted  under  the 
punch  speed  of  v=5mm/min. 


RESULTS  AND  DISCUSSION 

Figure  5  shows  the  friction  coefficient  p  estimated  in  a  constant  BHF  (2kN)  test  for  two  different 
lubrication  conditions  of  the  acetone  degreasing  and  fluorine.  The  mean  value  of  p  in  the  case  of  the 
acetone  degreasing  is  approximately  0.15,  while  the  case  of  the  fluorine  0.05.  The  relationship  between  p 
and  the  lubrication  proves  valuable  results  because  the  anisotropy  of  the  steel  sheet  is  taken  into 
consideration.  Figure  6  shows  the  punch  load  and  BHF-ADR*  curves,  in  a  fuzzy  adaptive  BHF  control  test 
with  three  types  of  lubrication;  acetone  degreasing,  the  fluorine  and  a  combination  of  them  (as  shown  in 
Table  3).  In  the  combination  (degreasing/fluorine),  punch  load  curve  is  very  close  to  the  case  of  the  acetone 
degreasing  type  during  initial  stage  (until  about  ADR*=0.2).  However,  from  the  middle  to  the  last  stage,  the 
punch  load  curve  gradually  decreases  more  than  the  acetone  decreasing  type  and  approaches  to  the  fluorine 
result.  Contrarily,  the  behavior  of  the  fuzzy  adaptive  BHF  curve  is  very  close  to  the  case  of  the  acetone 
degreasing  in  the  early  stage,  but  in  the  middle  to  the  last  stages,  decreases  more  than  the  acetone 
decreasing  and  approaches  to  the  fluorine,  which  is  the  same  as  in  the  case  of  the  punch  load  curve.  As  a 
result,  it  is  confirmed  that  the  adaptive  BHF  is  appropriately  controlled  to  cope  with  the  frictional  change. 
Figure  7  shows  the  variation  in  p  predicted  by  Eq.  (7)  for  four  lubricating  types.  In  the  case  of  acetone 
degreasing,  p  is  the  highest  and  fluorine  the  lowest.  Meanwhile,  the  result  of  combined  lubrication  shifts  to 
the  alternative  state  on  the  way  corresponding  to  the  lubricating  patterns.  From  the  results,  it  is  confirmed 
that  the  fuzzy  adaptive  BHF  control  system  is  valid  for  steel  sheet  with  high  r-value  as  well  as  aluminum 
alloy  [7]  and  possesses  high  adaptability  to  friction  change  during  the  process. 


CONCLUSIONS 

1.  The  fuzzy  adaptive  BHF  control  deep  drawing  system  can  achieve  high  reliability  and  adaptability  to 
any  friction  condition  including  frictional  change  during  the  process  for  steel  sheet  with  high  r-value. 

2.  The  anisotropy  of  blank  material  has  to  be  considered  in  order  to  evaluate  the  friction  coefficient. 

3.  The  friction  evaluation  formula  from  a  simple  plastic  model  is  confirmed  to  be  effective  for  steel  sheet, 
using  the  partial  lubrication  combination  of  acetone  degreasing  and  fluorine. 


606 


Fig.6  Punch  load  and  BHF  vs.  ADR*  curves 


Fig.7  Effect  of  lubrication  conditions  on  friction 
coefficient  m-DDR*  relation  (Fuzzy  adaptive 
BHF  control  test) 


REFERENCES 

1.  B.  Kaftanoglu,  1973.  Determination  of  Coefficient  of  Friction  under  Conditions  of  Deep-Drawing  and 
Stretch  Forming.  Wear,  25,  177. 

2.  S.  Rajagopal,  1 98 1 .  A  Deep  Drawing  Test  for  Determining  the  Punch  Coefficient  of  Friction.  Trans. 
ASMEJ.  Eng.  Ind.,  103,  197. 

3.  N.  Kawai,  1961.  Critical  Conditions  of  Wrinkling  in  Deep  Drawing  of  Sheet  Metals.  Reports  1,2  and  3, 
Bull.  J.S.M.E.,  4,  169. 

4.  S.  Thiruvarudchelvan  and  J.  Gan,  1994.  Drawing  of  Hemispherical  Cups  with  Friction-actuated  Blank 
Holding.  J.  Mater.  Process.  Technol.,  40,  327. 

5.  Y.W.  Wang  and  A.  Majilessi,  1994.  The  Design  of  An  Optimum  Binder  Force  System  for  Improving 
Sheet  Metal  Formability.  Proc.  18th  1DDRG  Biennial  Congress,  491 . 

6.  S.  Yoshihara,  K.  Manabe,  M.  Yang,  and  H.  Nishimura,  1997.  Fuzzy  Adaptive  Control  of  Circular-Cup 
Deep-Drawing  Process  Using  Variable  Blank  Holder  Force  Technique.  J.  JSTP  38-435  (in  Japanese),  46. 

7.  S.  Yoshihara,  K.  Manabe,  and  H.  Nishimura,  1998.  Fuzzy  Adaptive  Control  of  Blank  Holder  in 
Circular-Cup  Deep-Drawing  (Adaptability  to  Frictional  Change  and  Simple  Evaluation  Lubrication). 
Trans.  JSME  64-624  (in  Japanese),  3209. 


607 


AN  Al  PROCESS  CONTROL  SYSTEM  WITH  SIMULATION 
DATABASE  AND  ADAPTIVE  FILTER  FOR  V-BENDING 

Ming  Yang*+  ,  Atsushi  Katayama*,  Ken-ichi  Manabe*,  Naoyuki  Aikawa** 

*  Department  of  Mechanical  Engineering,  Tokyo  Metropolitan  University, 

1-1  Minami-osawa,  Hachioji-shi,  Tokyo,  192-0397,  Japan 
+  E-mail  vang@mech.metro-u.ac.jp 

**  Department  of  Electronics,  Faculty  of  Engineering,  Tokyo  Engineering  University, 
1401-1  Katakura,  Hachioji-shi,  Tokyo  192-8580,  Japan 

ABSTRACT 

In  this  study,  an  artificial  intelligence  (AI)  V-bending  process  control  system  with  a  numerical  simulation 
database  and  adaptive  filter  was  proposed  and  developed  to  achieve  production  with  high  accuracy  and 
flexibility.  The  punch  force-stroke  curve  (F-S  curve)  which  includes  process  information  in  a  compound 
manner  is  stored  in  the  database  as  expertise  and  applied  to  evaluate  and  control  the  process.  An  FEM  code 
was  used  to  simulate  the  V-bending  process  to  obtain  the  F-S  curve  during  loading  and  the  springback 
value  during  unloading  of  the  process.  An  online  adaptive  filter  was  applied  to  modify  the  simulated  F-S 
curve  and  the  simulated  springback  value.  Furthermore,  the  concept  of  multi-regional  filter  is  proposed  to 
improve  filtering  accuracy.  The  modified  F-S  curves  and  springback  values  are  stored  in  the  database  as 
pseudo-experimental  ones,  and  used  in  the  V-bending  process  control  with  an  intelligent  process  control 
system.  The  AI  control  system  of  the  V-bending  process  was  evaluated  using  four  kinds  of  materials  as 
workpieces.  Results  show  the  FEM  simulation  database  with  online  adaptive  filtering  is  very  effective  for 
precision  control.  A  high  accurate  V-bending  process  was  achieved  without  the  trial  of  V-bending  tests. 

Keywords:  V-bending,  database,  F-S  curve,  FEM  simulation,  online  adaptive  filter,  AI  process  control 


INTRODUCTION 

In  sheet  metal  forming,  the  processes  which  adapt  to  the  production  of  the  variety  and  small  batches  of 
parts  are  requested.  The  accuracy  is  also  important  for  automation  of  the  process.  However,  in  a  metal 
forming  process,  there  may  contain  errors  from  unexpected  variations  in  tool  characteristics  or  in  the 
incoming  workpiece.  The  sheet  metal  forming  processes  still  depend  on  the  manufacturing  craftsmen  in 
practical  production  because  of  the  complexity  of  metal  deformation  and  the  unexpected  variations.  The 
craftsmen  ponder  the  correctness  of  the  setup  and  judge  whether  changes  should  be  made.  During  the 
process,  they  use  the  blend  of  sensory  skills  that  include  vision,  touch,  sensitivity  of  forces  and  so  on,  to 
measure  whether  the  process  is  proceeding  along  according  to  plan.  After  a  part  is  made,  they  measure  it 
and  then  reevaluate  the  tools  and  other  process  control  parameters.  In  doing  so,  the  craftsmen  are  learning 
and  relating  to  the  underlying  form  of  the  machine  and  the  process  [1].  In  the  V-bending  process,  setup  of 
the  machine  and  the  tool  significantly  depends  on  thickness  and  material  properties  of  the  workpiece.  In  a 
practical  process,  the  craftsmen  adjust  tools  or  control  parameters  to  compensate  for  variations  in  the 
thickness,  material  properties  and  other  conditions  with  their  skills  and  expertise.  Process  control  is 
concerned  with  monitoring  in-process  forming  states  and  making  adjustments  to  account  for  variations. 

In  this  study,  an  intelligent  control  system  with  a  simulation  database  and  an  online  adaptive  filter  was 
proposed  and  developed  for  precise  V-bending  process.  In  addition  of  the  database,  an  adaptive  filter  and  a 
fuzzy  inference  model  are  included  in  the  control  system.  Here,  the  fuzzy  inference  model  with  the 
database  corresponding  to  the  craftsmen's  skills  and  expertise.  An  online  adaptive  filter  was  developed  to 
modify  the  simulated  F-S  curve  and  the  simulated  springback  value  in-process.  Furthermore,  a  multi- 
regional  filter  was  proposed  to  improve  the  filtering  accuracy.  The  developed  control  system  was  evaluated 
by  using  bending  tests  with  several  kinds  of  materials  and  without  any  trial  of  V-bending  tests. 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


608 


Fig.  1.  Concept  of  intelligent  V-bending  process  control  system  with  database  and  fuzzy  inference  model. 


INTELLIGENT  V-BENDING  PROCESS  CONTROL  SYSTEM  WITH 
SIMULATION  DATABASE  AND  ONLINE  ADAPTIVE  FILTER 

Figure  1  shows  an  intelligent  control  system  with  expertise  developed  by  the  authors  in  order  to 
compensate  for  variations  in  thickness  and  material  properties  of  the  workpiece  and  also  in  the  machine  and 
tool  conditions  [2].  Here,  a  database  and  a  fuzzy  inference  model  corresponding  to  the  craftsmen's  skills 
and  expertise  were  applied  to  on-line  process  control.  In  this  system,  the  punch  force  stroke  curves  (F-S 
curves)  were  measured  in-process  and  stored  with  other  factors,  such  as  bend  angle,  tool  conditions,  and 
ambient  temperature,  to  the  database.  Figure  2  shows  the  F-S  curves  measured  for  different  materials. 
Discrepancies  between  the  curves  correspond  to  with  variations  of  the  geometrical  conditions,  material  and 
other  parameters  in  a  compound  manner.  The  fuzzy  inference  model  for  evaluation  of  the  F-S  curves 
corresponds  to  blend  the  sensory  information  synthetically  for  decision  of  the  process  conditions. 

For  the  widely  applicable  and  robust  process  control,  the  authors  proposed  to  utilize  FEM  analysis  to 
simulate  the  V-bending  process  in  order  to  construct  a  simulation  database  for  the  control  system.  Since 
the  FE  analysis  is  widely  applied  in  order  to  simulate  the  metal  forming  processes  and  its  accuracy  has 
significantly  been  improved  in  this  decade  [3],  an  FEM  code  is  applied  to  simulate  the  V-bending  process. 
The  concept  of  the  system  with  the  simulation  database  is  shown  in  Fig.3  [4].  The  system  consists  of  an 
FEM  simulator,  a  simulation  database,  and  an  online  adaptive  filter  in  addition  to  the  control  system  as 
shown  in  Fig.l.  Prior  to  the  V-bending  operation,  simulation  of  the  process  is  carried  out  using  a  series  of 
the  material  properties  of  the  workpiece  to  cope  with  scattering  in  the  material  properties.  The  simulated 
results  are  stored  in  the  simulation  database.  During  the  V-bending  process,  the  in-process  measured 
parameters,  punch  force  and  stroke,  are  compared  with  ones  in  the  database.  The  discrepancies  between  the 
simulated  and  experimental  parameters  are  evaluated  and  applied  to  design  an  adaptive  filter  in 
compensation  for  the  simulation  results  in  the  database  and  then,  the  filtered  simulation  results  are  put  in 
the  fuzzy  inference  model  for  the  process  control. 

Since  the  simulated  V-bending  process  only  depends  on  deformation  characteristics  of  the  workpiece  not 
on  the  characteristics  of  bending  machine,  it  is  easy  to  be  applied  to  various  machines  by  evaluating 
machine  properties  and  designing  an  adaptive  filter  for  the  particular  machine.  Figure  4  shows  the 
conceptual  adaptive  filter  for  simulation  database  of  V-bending  processes  with  various  machines.  The 
signals  in  the  simulation  database  are  converted  to  the  database  for  each  machine  by  combining  with 
signals  of  the  machinery  properties.  Conversely,  the  signals  in  the  database  for  a  particular  machine  can  be 
also  divided  to  the  signals  on  the  deformation  of  workpiece  and  on  the  machine.  As  a  practical  application. 


609 


1  •  ■  1  n 

;  SllS 

'  '  '  ' 

“-—A. 

1  1  1  ‘ 

T  rv-r- 

:  > 

> 

/Spc 

J . J 

Brass 

a' 

: 

f 

Al 

j.?  l  ; 
_  JjLJ  ■ 

J . 

} 

■  .  ■  ■ 

0  1  2  3  4  5  6 


Punch  stroke  /mm 

. Aluminum  alloy  (Al) 

'  Stainless  steel  (SUS) 

. Mild  steel  (SPC) 

Brass 


JOft-lme  procedure - - 

!  Tensile  test  j 

!  Suraiafion  by  FEM 


Fig.  2.  F  -S  curve  for  various  materials  Fig.  3.  Concept  of  intelligent  V-bending  process 

control  system  with  simulation  database 

a  database  of  any  old  machine  can  be  convert  to  one  for  a  new  machine  by  abstracting  the  information  from 
the  old  machine  and  then  combining  with  the  information  about  the  new  machine. 

Characteristics  of  the  process  control  system  with  simulation  database  and  adaptive  filter  are  as  follows: 

-  All  of  the  information  in  the  database  are  obtained  for  the  basic  material  properties  by  tensile  test,  and 
thus  trial  and  error  of  the  V-bending  tests  are  not  necessary  even  for  new  materials. 

-  Reliability  of  the  filtered  simulation  information  may  become  as  high  as  the  experimental  one. 


machine  A  machine  B  machine  C 

Fig.  4  Application  of  adaptive  filter  for  different  machines 

-  Since  the  database  are  constructed  by  the  FEM  simulation,  the  data  distribution  corresponding  to  the 
scattering  range  of  the  material  properties  and  the  process  conditions  can  be  easily  designed  and 
obtained.  The  information  uniformly  distributed  in  the  database  is  important  for  high  accurate  process 
control  [5]. 


-  It  is  easy  to  convert  the  simulation  database  to  any  bending  machine,  or  convert  the  database  of  old 
machine  to  the  new  one  for  saving  resources  and  time. 


610 


COMPENSATION  OF  SIMULATION  RESULTS 
WITH  ONLINE  ADAPTIVE  FILTER 

The  authors  proposed  an  online  adaptive  filter  to  compensate  the  process  information  of  the  simulation  in 
this  study.  The  proposed  online  adaptive  filter  was  designed  in  order  to  modify  the  calculated  F-S  curve 
and  springback  value  of  V-bending  process.  Figure  5  shows  the  flow  of  modification  by  the  adaptive  filter. 
Both  calculated  and  online  measured  experimental  F-S  curves  are  transformed  into  frequency  domain  by 
FFT.  The  spectrums  of  the  two  curves  are  compared  and  the  differential  spectrums  are  employed  to  design 
the  filter.  A  multi-regional  filter  is  newly  proposed  to  improve  the  accuracy  of  filtering  in  this  study.  The 
concept  of  modification  of  F-S  curve  with  the  multi-regional  filter  is  shown  in  Fig.  6.  The  F-S  curve  of  V- 
bending  process  can  be  divided  to  three  different  areas,  which  are  elastic  deformation  area,  around  yielding 
area  and  plastic  deformation  area.  The  dominant  spectrums  differ  from  each  other  in  the  different  areas. 
Therefore,  applying  adaptive  filters  to  the  three  areas  respectively  could  significantly  improve  the  filtering 
accuracy.  The  modified  F-S  curve  is,  then,  transformed  to  the  pseudo  experimental  curve  by  inverse  FFT. 

Figure  7(a)  shows  the  calculated  punch  force-stroke  curve  (F-S  curve)  in  comparison  with  the  experimental 
one.  The  figure  shows  that  there  are  some  discrepancies  between  the  two  curves,  and  the  discrepancies  are 
not  uniform  along  the  curves  due  to  the  transition  of  deformation  of  the  workpiece.  Figure  7(b)  shows  a 
comparison  of  a  modified  (pseudo  experimental)  F-S  curve  and  corresponding  experimental  one.  It  is  seen 
that  the  pseudo  experimental  F-S  curve  completely  coincides  with  the  experimental  one  compared  to  the 
unfiltered  one.  Therefore,  the  database  with  the  pseudo  experimental  F-S  curve  can  be  used  in  the  process 
control  with  accuracy  equivalent  to  the  experimental  one. 

Since  the  simulated  punch  strokes  for  the  desired  bend  angles  also  include  errors,  which  do  not  appear  on 
the  F-S  curve,  it  is  necessary  to  compensate  for  the  calculated  punch  stroke  for  the  desired  bend  angle  as 
well  as  for  the  F-S  curve.  Figure  8  shows  the  calculated  and  experimental  punch  strokes  vs.  the  bend  angle 
around  90  degree.  It  is  due  to  the  same  reasons  discussed  previously  about  the  springback  analysis.  In  this 
study,  both  calculated  and  experimental  punch  stroke  are  assumed  to  be  inversely  proportional  to  the  bend 
angle,  respectively  and  the  slopes  of  the  two  lines  to  be  the  same.  The  coefficients  in  the  filter  were  applied 
to  compensate  the  punch  stroke  and  bend  angle  relationship. 


Sensing  data  Simulation  data  base 


Large  plastic 
deformation 


Fig.  5.  Configuration  of  F-S  curve  modification  Fig.  6.  Modification  of  F-S  curve  with  multi-regional  filter 
with  online  adaptive  filter 


611 


SUS  experimental 
SUS  simulated 


(a)  (b) 

Fig.  7  modification  of  F-S  curve  by  multi-regional  adaptive  filterDSUS304D 


EVALUATION  OF  PROCESS  CONTROL  MODEL 

An  offline  evaluation  of  the  V-bending  process  control  system  was  implemented  by  using  the  experimental 
results  of  four  kinds  of  materials  as  the  workpiece.  The  materials  used  are  aluminum  alloy  (A5 182-0),  mild 
steel  (SPCD),  brass  (C6000-0)  and  stainless  steel  (SUS304).  Figure  9  shows  the  accuracy  of  the  bend 
angle  obtained  by  the  intelligent  process  control  system  with  the  simulation  database  and  the  online 
adaptive  filter.  The  results  show  that  almost  of  bend  angles  ranged  in  90±0.25  degree  and  scatters  in  the 
workpieces  of  the  same  material  were  very  small.  A  very  high  accurate  V-bending  process  was  achieved 
although  the  information  in  the  database  was  provided  only  by  the  simulation  without  a  trial  of  bending.  It 
is  confirmed  that  the  simulation  database  with  the  online  adaptive  filter  is  very  effective  for  the  control 
system,  and  the  multi-regional  filter  copes  with  various  materials  without  reduction  of  the  accuracy. 
Furthermore,  the  concept  of  AI  process  control  with  filtered  simulation  database  can  be  applied  to  the  other 
metal  forming  process  controls  as  well. 


Delta  angle  from  90^  B  /  min 

Fig.  8.  Comparison  of  experimental  and  simulated  punch  stroke-bend  angle  relationship 
around  the  desired  bend  angle  90  degree 


612 


1  2  3  4  5  6  7  8 

Workpiece 


|  •  A5182-0 
■  SPCD 
A  C2600-0 
♦  SUS304 


I  I  90$  15* 

I  I  90$  30f 

I  I  Larger  than  90$  30f 


Fig.  9.  Offline  Evaluation  of  process  accuracy  controlled  using  adaptive  filter 


CONCLUSION 

An  intelligent  control  system  with  a  simulation  database  and  an  online  adaptive  filter  was  proposed  and 
developed  for  precise  V-bending  process.  An  online  adaptive  filter  was  developed  to  modify  the  simulated 
F-S  curve  and  the  simulated  springback  value  in-process.  Furthermore,  a  multi-regional  filter  was  applied 
to  improve  the  filtering  accuracy.  The  evaluation  results  show  that  the  simulation  database  with  the  online 
adaptive  filter  is  very  effective  for  the  precision  process  control.  A  high  accurate  V-bending  process  was 
achieved  without  any  trial  of  V-bending  tests. 


ACKNOWLEDGEMENTS 

The  authors  would  like  to  thank  Amada  Foundation  for  Metal  Work  Technology  for  supporting  this  work. 
Thanks  are  also  due  to  Dr.  H.  Ogawa  of  Polytechnic  University  for  his  kind  advice  on  the  FEM  simulation. 


REFERENCES 

1.  P.  K.  Wright,  D.A.  Bourne,  1988.  Manufacturing  Intelligence,  Addison  Wesley. 

2.  M.  Yang,  N.  Kojima,  K.  Manabe,  H.  Nishimura,  1997,  JSME  int.  J.  Series  C,  40(1),  157-162. 

3.  A.  Makinouchi,  1996,  J.  Materials  Processing  Technology,  60,  19-26. 

4.  M.  Yang,  K.  Manabe,  1998,  J.  Metals  and  Materials,  4(3),  315-318. 

5.  M.  Yang,  K.  Manabe,  H.  Nishimura,  1996,  J.  Materials  Processing  Technology,  60,  249-254. 


613 


Intelligent  Manufacturing  II 


614 


615 


The  Distributed  intelligent  Control  of  Complex  Systems 

Wayne  J.  Davis 


Professor  of  General  Engineering, 
University  of  Illinois  at  Urbana-Champaign, 
Urbana,  IL  61801 


ABSTRACT 

This  paper  begins  by  discussing  the  relationship  of  planning  and  control,  the  notion  of  intelligent  control, 
and  the  need  for  the  distributed  intelligent  control  of  complex  systems.  The  paper  then  discusses  an 
architecture  for  the  distributed  intelligent  control  architecture.  The  first  step  is  to  define  a  modeling 
paradigm  that  will  recursively  decompose  the  overall  system  into  a  collection  of  subsystems  where 
intelligent  control  exists.  Next,  the  structure  of  the  intelligent  controller  is  defined,  focusing  on  the  process 
of  task  assignment,  planning  and  execution. 


INTRODUCTION  AND  PROBLEM  STATEMENT 

Throughout  the  last  fifty  years,  the  engineering/scientific  community  has  witnessed  numerous  advances  in 
control  and  planning  technologies.  In  certain  cases,  these  advancements  have  evolved  from  new  analytical 
approaches.  For  example,  in  the  planning  area,  new  interior  search  algorithms  were  developed  for  solving 

the  linear  programming  problem.  In  the  control  field,  H  control  procedures  represent  a  new  approach  in 
which  a  rich  analytical  foundation  has  been  developed.  In  other  cases,  however,  the  new  technologies  have 
been  more  dependent  upon  computational  advances.  In  planning,  genetic  and  evolutionary  programming 
algorithms  are  certainly  dependent  upon  such  advances.  Similarly,  artificial  neural  nets  and  fuzzy  control 
methods  are  highly  dependent  upon  the  advances  of  our  computational  capabilities. 

Despite  continued  improvements  in  planning  and  control  technologies,  these  areas  continue  to  be  viewed  as 
distinct  from  each  other.  Yes,  there  have  been  interactions  between  the  two  technologies,  i.e.,  mathematical 
programming  procedures  have  been  employed  to  solve  constrained  optimal  control  problems.  Similarly, 
optimal  control  approaches  have  been  applied  in  planning.  During  the  70s,  there  were  embryonic  efforts  to 
integrate  these  technologies  through  a  comprehensive  general  systems  approach.  A  few  authors  such  as 
Cannon,  Callum  and  Polak  [1]  addressed  the  relationships  between  planning  and  control  in  a  direct  fashion. 

Perhaps,  the  major  reason  that  planning  and  control  technologies  are  still  viewed  as  being  distinct  from 
each  other  results  from  the  mindset  that  the  practitioners  of  these  technologies  adopt  while  solving  their 
respective  problems.  Planners  are  typically  interested  in  formulating  a  plan,  but  seldom  specify  the  manner 
by  which  the  plan  will  be  implemented.  On  the  other  hand,  designers  of  control  systems  often  begin  with  a 
desired  behavior  and  then  employ  control  approaches  to  generate  this  behavior.  In  a  certain  sense, 
practitioners  of  each  technology  are  really  looking  at  one  half  of  a  problem. 

In  the  last  decade,  new  intelligent  control  technologies  have  evolved.  The  goal  of  intelligent  control  is  to 
integrate  planning  and  control  in  order  to  permit  a  system  to  plan  and  execute  its  response  in  an  on-line 
fashion.  The  intelligent  control  approach  should  be  contrasted  against  the  more  classical  optimal  control 
approach  that  begins  by  developing  an  optimal  course  of  action  in  an  off-line  setting  and  then  attempts  to 
implement  that  action  by  employing  the  generated  control  law.  Intelligent  control,  however,  recognizes  the 
fact  that  the  planning  problem  are  dependent  upon  the  current  state  of  the  system.  The  current  state  of  the 
system,  in  turn,  is  influenced  by  control  inputs  derived  from  the  application  of  a  control  law  as  well  as 
external  inputs  into  the  system.  There  is  often  uncertainty  in  the  response  of  the  system.  For  example, 
external  inputs  to  the  system  are  often  difficult  to  predict.  Usually,  one  compensates  for  this  uncertainty  by 
employing  feedback  control  techniques.  However,  in  other  cases,  the  current  planning  problem  is  modified 
to  such  an  extent  that  a  new  control  policy  should  be  sought. 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


616 


Bishop,  Shirkey  and  Spong  [2]  developed  an  interesting  example  of  intelligent  control  when  they  designed 
a  controller  that  would  permit  a  robot  to  play  air  hockey.  Using  a  vision  system,  the  control  system  first 
computed  the  trajectory  of  the  incoming  puck.  Next,  it  computed  the  optimal  trajectory  for  the  robotic  arm 
so  that  it  intercepted  the  incoming  puck  and  returned  it  toward  the  opponent’s  net.  It  is  obvious  that  each 
time  the  robot  responded  to  the  incoming  puck,  it  needed  a  new  trajectory  to  follow.  Thus,  planning  had  to 
be  addressed  in  an  on-line  fashion  and  was  certainly  dependent  upon  the  state  of  the  system  (i.e.  the 
trajectory  of  the  incoming  puck). 

Today,  intelligent  control  is  usually  employed  to  supervise  or  coordinate  actions  of  one  or  more  processes 
whose  state  evolution  can  be  described  via  differential  equations.  That  is,  the  subordinate  processes  are 
continuous  state  systems.  Seldom  has  more  than  one  level  of  supervisory  control  been  considered. 

On  the  other  hand,  modem  advancements  in  computer  technology  have  led  to  the  design  or  contemplation 
of  far  more  complex  systems  such  as  advanced  manufacturing  systems,  transportation  systems  and  so  forth. 
The  complexity  of  these  systems  necessitates  that  the  overall  supervisory/management  function  be 
addressed  in  a  distributed  manner.  That  is,  an  ensemble  of  controllers  must  be  defined  and  coordinated 
within  a  comprehensive  control  architecture.  While  some  of  the  controllers  supervise  processes,  others 
supervise  other  controllers.  An  effective  control  architecture  should  possess  the  following  properties: 

•  It  allows  the  user  to  specify  high-level  tasks,  which  are  then  recursively  decomposed  into  more 
detailed  tasks  that  are  eventually  executed. 

•  It  considers  planning  and  problems  on  multiple  time  scales  and  at  multiple  levels  of  detail.  That  is, 
planning  and  control  become  multi-resolutional  in  nature. 

•  It  allows  complex  functional  behaviors  to  be  decomposed  into  more  manageable  subfunctions. 

•  It  also  permits  a  given  management  function  to  be  distributed  across  several  intelligent  controllers. 

If  one  looks  at  each  of  these  properties,  one  observes  that  each  addresses  a  different  type  of  decomposition. 
The  first  property  deals  with  task  decomposition.  The  second  property  considers  both  temporal 
decomposition  as  well  as  aggregation/disaggregation  principles.  In  the  third  property,  functional 
decomposition  is  considered.  Finally,  spatial  decompositions  are  addressed.  The  important  observation  is 
that  a  well-designed  control  architecture  must  address  all  the  above  decomposition  modes  concurrently. 

We  have  also  introduced  a  new  element  to  our  planning  and  control  problem  with  the  inclusion  of  tasks. 
Although  most  people  easily  understand  the  concept  of  a  task,  its  consideration  within  conventional 
planning  and  control  procedures  is  not  a  trivial  concern.  Most  control  algorithms  presume  that  the  system^ 
dynamics  can  be  described  by  a  set  of  differential  or  difference  equations.  However,  when  tasks  are  being 
considered,  the  system  response  includes  discrete  events  where  the  system  state  can  change  abruptly. 
Examples  of  events  include  the  start  and  finish  of  a  task  as  well  as  points  in  time  where  controllers  interact 
with  each  other.  The  resulting  discrete-event  nature  of  these  systems  invalidates  the  application  of 
conventional  control  algorithms  to  these  systems. 

The  inclusion  of  tasks  also  complicates  planning  because  there  are  additional  concerns  to  be  addressed. 

•  Decomposition  of  high-level  tasks  into  more  detailed  subtasks  requires  definition  of  comprehensive 
relationships  among  the  tasks. 

•  Detailed  instructions  are  required  to  execute  each  individual  task  by  any  process  within  the  managed 
system. 

•  Planning  must  define  which  processing  resource  executes  each  task  and  then  schedule  its  execution. 

•  Often,  execution  of  a  task  by  a  given  process  will  require  additional  physical  entities  and  resources  to 
be  assembled  at  the  processing  resource.  That  is,  the  decision  to  execute  a  task  at  a  given  process  may 
generate  additional  tasks  that  must  be  executed  before  and  after  the  target  task  is  executed. 

The  ability  to  consider  all  the  constraints  is  well  beyond  the  scope  of  most  current  planning  procedures. 

Invariably,  simplifications  must  be  made  which  diminish  the  ability  to  specify  a  feasible  plan,  i.e;  what  is 
planned  cannot  be  implemented.  The  inherently  infeasible  nature  of  the  generated  plans  further  contributes 
to  the  uncertainty  of  system  response  which,  in  turn,  makes  control  requirements  more  difficult  to  address. 


617 


Finally,  current  planning  practices  seldom  consider  implementation  concerns  while  formulating  the  plan. 
Rather  the  plan  is  passed  to  another  control  element  that  attempts  to  implement  the  plan  until  deviations 
between  the  planned  and  actual  response  become  so  large  that  replanning  is  needed.  In  reality,  planning  is 
constantly  trying  to  catch  up  to  the  system  response,  and  the  system  is  never  in  control. 

Not  only  is  there  concern  about  integrating  planning  and  control  in  a  logical  manner  at  a  given  controller; 
but  there  is  additional  concern  to  distribute  the  integrated  planning  and  control  function  across  several 
intelligent  controllers.  After  the  planning  and  control  functions  are  distributed,  the  implemented  control 
architecture  must  provide  mechanisms  by  which  the  controllers  can  interact  so  that  a  coordinated  response 
is  generated.  Again,  the  requirements  for  distributing  planning  and/or  control  in  this  complex  task 
execution  environment  are  well  beyond  current  technological  capabilities. 

Perhaps  the  most  troubling  issue  is  that  our  current  simulation  capabilities  do  not  permit  us  to  model  the 
behavior  of  a  system  while  it  operates  under  a  given  control  architecture  [3].  So,  not  only  is  there  little 
theoretical  guidance  to  specify  the  required  architecture,  there  is  no  means  to  test  a  particular  architecture. 
Often,  the  first  test  of  a  system’s  behavior  occurs  when  it  becomes  operational.  It  is  often  too  late  or  too 
costly  to  make  changes. 

Given  the  current  state  of  affairs,  a  chasm  has  developed  between  the  theorist  and  the  practitioner.  The 
need  to  design  and  implement  complex  systems  is  increasing  important.  Today,  control  systems  are  being 
designed  for  complex  systems  on  an  ad  hoc  basis  with  little  or  no  theoretical  guidance.  The  resulting 
systems  often  fail  to  meet  their  expectations. 


OUR  INTEGRATED  SOLUTION 

Due  to  the  complexity  of  the  current  state  of  affairs,  there  is  no  simple  fix.  Major  advancements  are  needed 
in  the  areas  of  system  modeling,  planning  and  control.  More  importantly,  however,  there  is  a  need  to 
integrate  these  historically  distinct  topics  into  a  unified  approach  for  designing  and  implementing  the 
distributed  intelligent  control  architectures  that  are  needed  to  manage  these  systems.  In  the  remainder  of 
this  paper,  we  will  discuss  our  approach  to  this  problem. 

The  central  element  of  the  integrated  solution  is  a  new  modeling  paradigm.  This  modeling  approach  begins 
with  a  view  of  a  large  scale-system  as  a  system  of  system.  Using  object-oriented  modeling  practices,  we 
define  a  standard  modeling  template  that  can  be  recursively  employed  to  decompose  the  overall  system 
into  its  constituent  set  of  subsystems.  The  basic  modeling  template,  the  coordinated  object,  is  shown  in 
Fig.  1.  The  coordinated  object  includes  several  critical  elements.  First,  we  assume  that  there  is  a  set  of 
processing  subsystems,  PI  through  PN,  where  processing  tasks  will  occur.  We  further  assume  that  the 
processing  tasks  will  be  performed  upon  or  with  the  assistance  of  other  physical  entities.  These  physical 
entities  enter  the  coordinated  object  and  join  the  input  queue.  When  they  enter,  the  coordinated  objectk 
supervisor  specifies  a  set  of  tasks  to  be  performed  upon  the  entities.  Hence,  there  are  two  distinct  flows  to 
be  considered:  the  flow  of  physical  entities  and  the  flow  of  tasks. 

Each  coordinated  object  also  contains  a  set  of  interfacing  subsystems  that  perform  tasks  supporting 
implementation  of  tasks  at  processing  subsystems.  For  example,  a  material  handling  system  can  move  a 
job  entity  between  processing  substations.  Each  coordinated  object  also  contains  an  intelligent  controller. 

The  intelligent  controller  is  responsible  for  receiving  the  tasks  from  the  coordinated  object’s  supervisor  as 
each  entity  arrives  at  the  coordinated  object.  These  tasks  are  then  decomposed  while  additional  supporting 
tasks  are  defined  and  then  reassigned  to  the  processing  and  interfacing  subsystems  for  execution. 

Several  forms  of  decomposition  must  be  addressed  simultaneously.  The  coordinated  object  handles  those 
decompositions  that  deal  with  physical  elements  of  the  system.  Particularly,  definition  of  the  processing 
and  interfacing  subsystems  generates  both  functional  and  spatial  decomposition.  Note  for  example,  that  the 
processing  subsystems  execute  processing  tasks  while  the  interfacing  subsystems  execute  tasks  that 
support  execution  of  processing  tasks.  It  is  obvious  that  a  functional  decomposition  has  been  employed. 
Note  that  the  processing  and  interfacing  subsystems  are  typically  distinct  from  one  other.  Hence,  a  spatial 


618 


decomposition  occurs  as  well.  The  other  decompositions  include  task,  aggregation/disaggregation  and 
temporal  decompositions.  These  occur  within  the  intelligent  controller  that  we  will  discuss  shortly. 

The  modeler  can  recursively  employ  the  coordinated  object  to  decompose  complex  systems  into  its  most 
basic  processes.  That  is,  any  subsystem  within  a  given  coordinated  object  can  also  be  viewed  as  a 
coordinated  object.  This  gives  rise  to  what  we  have  termed  the  Recursive  Object-Oriented  Coordination 
Hierarchy  (ROOCH).  In  Fig.  2,  we  provide  the  ROOCH  for  the  Rapid  Acquisition  of  Manufactured  Parts 
flexible  manufacturing  system  operated  by  the  US  Army  Tobyhanna  depot.  It  is  self-evident  that  the 
ROOCH  captures  the  system  of  system  nature  for  these  complex  systems. 

We  can  cover  only  briefly  the  concept  of  the  Coordinated  Object  and  the  ROOCH  in  this  paper.  The  reader 
is  referred  to  Davis  [3]  for  a  more  detailed  discussion. 


Fig.  1.  The  Coordinated  Object 

Unfortunately,  the  ROOCH  is  not  a  sufficient  solution  in  itself.  Rather,  the  ROOCH  defines  which 
subsystems  comprise  the  overall  system.  Essentially,  it  addresses  the  physical  concerns.  One  important 
outcome  of  the  ROOCH  is  the  specification  of  the  hierarchy  of  controllers  that  will  be  needed  to  manage 
the  system.  Note  that  each  coordinated  object  has  its  own  dedicated  intelligent  controller.  Further,  each 
intelligent  controller  serves  as  the  supervisor  to  the  intelligent  controllers  for  the  subsystems  that  are 
contained  within  the  coordinated  object.  In  addition,  each  intelligent  controller  also  has  a  supervisor  that  is 
the  intelligent  controller  for  the  coordinated  object  in  which  it  is  a  subsystem. 

After  we  have  defined  this  hierarchical  relationship,  we  must  then  define  the  mechanisms  by  which  these 
intelligent  controllers  will  interact.  One  approach  to  the  interaction  among  the  intelligent  controllers  is  the 
Real-time  Control  System,  defined  by  Albus  and  Meystel  [4],  Although  this  architecture  has  been 
employed  for  several  systems,  the  author  believes  that  it  is  not  the  only  solution. 


619 


Fig.  2.  The  Recursive  Object-Oriented  Coordination  Hierarchy. 

In  Figure  3,  we  depict  our  schema  for  interactions  among  the  intelligent  controllers.  Each  one  consists  of 
three  primaty  functions.  The  Task  Acceptor  is  responsible  for  accepting  new  tasks  from  the  supervisor  for 
implementation  within  the  coordinated  object  where  the  controller  resides.  When  new  tasks  are  accepted,  it 
is  necessary  to  insure  that  the  coordinated  object  executes  the  task  in  a  feasible  manner.  We  can  assume 
that  other  tasks  have  already  been  assigned  to  the  coordinated  object  and  are  now  being  implemented. 
When  each  task  is  assigned,  the  supervisor  typically  specifies  a  task  completion  time.  If  the  task  completion 
times  are  improperly  specified,  there  may  be  no  feasible  solution.  Hence,  it  is  essential  that  the  Task 
Acceptor  within  a  given  coordinated  object,  negotiate  with  its  supervisor^  Task  Assignor  to  establish  a 
meaningful  completion  time.  Finally,  it  is  essential  that  new  tasks  are  continuously  assigned  to  the 
coordinated  object  to  prevent  it  from  becoming  idle  after  completing  its  assigned  tasks. 

In  discussing  the  Task  Assignor,  we  assume  that  response  of  the  subsystems  is  not  deterministic.  This 
uncertainty  evolves  naturally  as  a  consequence  of  providing  intelligent  controllers  at  each  of  the 
subordinate  subsystems.  Because  subordinate  subsystems  can  have  intelligent  controllers,  they  perform 
their  own  planning  and  their  response  will  depend  on  the  plans  they  develop.  In  short,  a  given  intelligent 
controller  cannot  or  should  not  perform  detailed  planning  of  a  subsystem  that  has  its  own  intelligent 
controller.  Once  we  accept  this  principle,  behavior  of  the  subordinate  subsystem  is  no  longer  deterministic. 

Given  the  inherent  uncertainty  of  the  subsystems ’responses,  it  is  essential  that  we  employ  feedback  control 
principles  in  managing  the  subsystems.  The  role  of  the  Task  Assignor  is  to  employ  the  feedback  control 
law  that  has  been  selected  for  implementation.  Using  this  feedback  control  law,  the  Task  Acceptor  monitors 
feedback  information  from  its  subordinate  subsystems  and  then  interacts  with  each  subsystem’s  Task 
Acceptor  to  assign  new  tasks  for  the  subsystem  to  execute. 

The  Task  Assignor  also  employs  on-line  simulation  in  order  to  project  the  future  performance  of  the 
subsystems  as  they  continue  to  operate  under  the  current  control  law  given  their  current  state  (see  Davis  [3] 
for  a  discussion  of  on-line  simulation).  This  project  response  provides  feedback  information  to  the  Task 
Assignor  that  resides  within  the  supervisory  intelligent  controller.  It  also  provides  critical  information  that 
will  be  employed  by  the  other  functional  elements  within  the  intelligent  controller.  Using  this  projected 
response,  the  Task  Assignor  can  further  employ  predictive  control  techniques.  That  is,  not  only  can  it  take 
control  actions  based  upon  the  its  current  state,  but  also  bases  actions  upon  the  state  that  it  predicts  will 
occur.  In  this  manner,  the  controller  can  look  ahead  and  anticipate  what  may  happen  and  act  accordingly. 


620 


The  Task  Acceptor  employs  the  projected  response  under  the  current  control  law  as  a  baseline  trajectory 
upon  which  any  new  tasks  must  be  included.  In  this  manner,  the  Task  Assignor  can  negotiate  a  meaningful 
completion  date  with  its  supervisor^  Task  Assignor  in  order  to  insure  that  a  feasible  system  response  exists 
if  the  system  continues  to  operate  under  the  current  law. 

The  projected  response  under  the  current  control  law  also  provides  the  performance  standard  against  which 
any  other  potential  control  lawk  performance  will  be  compared.  The  determination  and  selection  of  new 
control  laws  is  addressed  by  the  Performance  Improvement  Function.  The  operations  of  this  function  are 
the  most  complex.  Basically,  the  function  begins  by  generating  new  control  laws  and  then  performs  on-line 
simulation  of  its  behavior  given  the  system’s  current  state.  Then,  the  performance  comparison  is  made.  If 
the  performance  of  the  new  control  law  is  better  than  that  of  the  current  control  law,  then  the  new  control 
law  is  sent  to  the  Task  Assignor  for  implementation. 


Fig.  3.  Schematic  for  the  Interaction  Among  Intelligent  Controllers 

There  are  two  important  tasks  performed  by  the  Performance  Improvement  Function.  The  first  task  is 
performing  the  on-line  simulation  analysis  and  performance  comparisons.  The  technologies  needed  to 
perform  this  task  are  still  evolving  (see  Davis  [3]).  The  second  task  is  the  generation  of  the  alternative 
feedback  control  laws.  This  is  a  very  important  requirement  that  cannot  be  discussed  here  due  to  space 
limitations.  It  is  important  to  state,  however,  that  we  have  defined  means  for  specifying  control  laws  that 
will  permit  our  intelligent  controller  to  employ  any  type  of  planning  algorithm.  That  is,  we  will  not  attempt 
to  specify  which  planning  algorithm  should  be  employed. 


621 


We  can  also  assume  that  the  subsystems  are  time  variant,  i.e.,  their  behavior  changes  with  time.  The  role  of 
the  System  Identifier  is  to  constantly  update  the  system  model  to  reflect  the  subsystems  current  behavior. 
This  system  model  is,  in  turn,  used  by  the  other  functions  within  the  intelligent  controller  in  order  to 
perform  on-line  simulation.  Again,  the  manner  by  which  one  performs  system  identification  for  this  class 
of  systems  is  a  research  topic.  However,  the  manner  in  which  we  model  the  system  significantly  simplifies 
the  identification  task.  Under  our  modeling  framework,  we  need  only  to  monitor  the  time  required  to 
complete  tasks  and  the  probability  of  successfully  completing  a  given  task. 

It  is  also  important  to  distinguish  the  above  configuration  for  an  intelligent  controller  from  the  more 
conventional  approach.  Most  designers  of  intelligent  controllers  separate  its  function  operation  into  a 
planning  and  control  (execution).  If  we  are  considering  a  single  intelligent  controller  that  is  managing  a 
collect  of  simple  dynamic  process  (they  do  not  have  intelligent  controllers),  then  this  may  be  the  ideal 
approach.  However,  if  we  consider  a  control  architecture  with  several  hierarchical  levels  of  intelligent 
controllers,  then  one  should  question  the  separation  of  the  planning  and  control  functions.  In  such 
architectures,  most  researchers  recognize  the  need  to  distribute  planning  and  control  responsibilities  among 
all  of  the  intelligent  controllers.  However,  with  an  intelligent  controller,  most  researchers  try  to  separate  the 
planning  and  control  responsibilities  among  the  various  functional  elements,  (see  Albus  and  Meystel  [4]). 

It  is  our  belief  that  planning  and  control  should  be  considered  in  an  integrated  fashion  by  every  intelligent 
controller  as  well  as  the  component  functions  within  the  intelligent  controller.  Our  goal  is  to  always 
consider  how  the  plan  will  be  implemented  while  we  are  formulating  the  plan.  We  have  attempted  to  define 
the  functional  elements  within  the  controller  in  order  to  implement  the  required  decomposition.  We  observe 
that  the  Task  Acceptor  has  the  longest  planning  horizon  because  it  is  considering  tasks  that  will  be  assigned 
to  the  coordinated  object.  The  Performance  Improvement  function  considers  tasks  that  have  already  been 
defined.  Finally,  the  Task  Assignor  attempts  to  assign  tasks  to  the  subordinate  subsystems  based  upon  their 
current  and  projected  states.  Note  also  that  the  planning  horizon  of  a  subordinate  subsystem  is  less  than  that 
of  any  functional  element  within  its  supervisor^  intelligent  controller. 

The  intelligent  controller  is  also  responsible  for  performing  task  decomposition.  Here,  all  the  functional 
elements  are  able  to  do  task  decomposition.  The  Task  Acceptor  must  define  the  new  subtasks  that  must  be 
executed  under  the  current  control  law  if  the  new  task  is  accepted.  The  Performance  Improvement  function 
must  consider  all  tasks  when  it  selects  the  control  law.  Note  that  for  flexible  systems  the  employed  task 
decomposition  may  be  dependent  upon  which  control  law  is  selected.  Finally,  the  Task  Assignor  performs 
the  task  decomposition  that  is  needed  to  execute  the  control  law.  Note  also  that  task  decomposition 
inherently  implies  that  the  subsystems  will  consider  the  planning  problems  in  greater  detail  than  does  their 
supervisor.  This  further  implies  that  there  is  aggregation  of  detail  as  one  moves  up  the  control  hierarchy. 


SUMMARY  AND  CONCLUSIONS 

Given  the  space  limitations,  this  paper  can  only  provide  the  most  basic  introduction  to  the  control 
architecture  discussed  in  this  paper.  There  are  numerous  related  details  that  should  be  addressed.  For 
example,  we  can  now  provide  a  mathematical  justification  for  our  design  of  the  intelligent  controller. 
Entirely  new  simulation  languages  are  being  developed  to  exploit  our  modeling  paradigm.  New 
technologies  are  being  developed  to  support  on-line  simulation  analyses.  The  topic  addressed  in  this  paper 
represents  an  entirely  new  area  of  planning  and  control  research,  for  which  little  is  now  understood. 

REFERENCES 

1.  M.D.  Cannon,  C.D.  Cullum,  E.  Pokak.  1970.  Theory  of  Optimal  Control  and  Mathematical 
Programming,  McGraw-Hill  Book  Company,  New  York. 

2.  B.  Bishop,  P.  Shirkey,  M.W.  Spong.  1995.  An  experimental  testbed  for  intelligent  control.  Proc.  of  the 
American  Control  Conference,  Seattle,  WA. 

3.  W.J.  Davis.  1998.  Real-time  simulation:  the  need  and  the  evolving  research  requirement.  Simulation 
Handbook,  ed.  J.  Banks,  465-516,  Wiley-Interscience,  New  York. 

4.  J.S.  Albus,  A.  Meystel.  1997.  Behavior  Generation  in  Intelligent  Systems.  National  Institute  of 
Standards  and  Technology  Internal  Report,  Gaithersburg,  MD. 


622 


623 


PDM-based  Virtual  Enterprises  -  Bridging  the  Semantic  Gap 

Andreas  Karcher,  Jorg  Wirtz 

Institute  of  Information  Technology  in  Mechanical  Engineering  (itm) 
Technical  University  Munich,  Germany 


ABSTRACT 

This  article  will  describe  the  experiences  which  itm  has  gathered  by  participating  in  an  European  aircraft 
project.  In  the  first  part,  the  problems  of  introducing  Standards-based  integration  of  PDM  (Product  Data 
Management)  systems  within  a  Virtual  Enterprise  will  be  described.  It  will  be  demonstrated  that  a  principle 
reason  for  these  problems  is  the  semantic  gap  between  the  interpretations  of  information  objects  managed 
in  common  by  the  participants.  The  second  part  of  the  article  will  introduce  a  new  approach  to  reducing  the 
semantic  gap  by  addressing  in  a  Requirement  Engineering  process,  the  specific  inter-company  aspects  of 
data-sharing  and  data-exchange. 


DESIGN  AND  MANUFACTURING  WITHIN  A  VIRTUAL  ENTERPRISE 

An  increasing  complexity  of  products  and  the  globalization  of  markets  in  mechanical  engineering  today, 
has  brought  the  need  for  the  reduction  of  costs,  higher  quality  and  reduction  of  time-to-market  in  product 
development  and  production  [1],  Product  design  and  manufacturing  in  world- wide  networks  offer  great 
potentials  for  the  reduction  of  costs  and  time-to-market  [2].  The  concept  of  a  Virtual  Enterprise  (VE) 
allows  for  the  integration  of  the  core  competencies  of  companies  in  a  flexible  alliance  for  a  defined  period 
of  time.  But  collaboration  in  Virtual  Enterprises  requires  new  communication  tools  and  data  exchange 
concepts  to  be  used  by  the  consortium  members  [2][3]. 

First  of  all,  the  information  for  designing  and  manufacturing  the  product  must  be  made  digitally  accessible 
throughout  the  Virtual  Enterprise  in  order  to  establish  a  CSE  (Concurrent/  Simultaneous  Engineering) 
development  process  between  the  partner  companies. 


DATA  AND  PROCESS  INTEGRATION  REQUIREMENTS 

In  order  to  reduce  the  time  to  market  of  a  new  product,  both  in  terms  of  development  and  production  time, 
the  partners  within  the  VE  must  have  access  to  all  relevant  product  data.  However  in  practice,  the 
information  flow  parallel  to  the  physical  flow  of  goods  between  the  logistic  departments  of  the  partners 
often  takes  more  time  then  actually  transporting  the  goods  themselves!  This  often  is  a  consequence  of 
manually  manipulating  the  manufacturing  data  which  is  to  be  exchanged  between  the  different  production 
planning  systems  (MRP:  Material  Resource  Planning).  Problems  in  exchanging  engineering  and 
manufacturing  data  between  different  partners  of  the  VE  occur  due  to  semantic  differences  in  interpreting 
the  underlying  information:  None  of  the  known  exchange  standards  like  EDIFACT,  ANSIxl2  or  ODETTE 
can  secure  the  semantic  integrity  of  the  data  exchange  [4],  But  in  order  to  achieve  improved  collaboration 
between  the  engineering  and  manufacturing  partners  of  the  VE  in  a  Concurrent/  Simultaneous  process  it  is 
necessary  to  distribute  information  about  parts  and  raw  materials  in  a  very  early  stage  of  product  design. 

Increasingly  PDM  systems  are  becoming  the  IT  backbone,  managing  dynamic  and  ever  more  complex 
product  development  and  manufacturing  processes.  As  a  consequence  of  the  distributed  processes  of  global 
VEk  new  requirements  are  emerging.  These  new  requirements  are  specifically  related  with  process 
integration  and  data  exchange  in  cross  company  collaboration.  Being  this  the  case  we  must  harmonize  the 
various  process  related  cultures  of  the  partners,  thus  reaching  integration  on  a  data  level  and  process  level 
using  IT  solutions.  A  centralized  approach  controlled  and  managed  by  a  central  database  fails  to  perform 
adequately  due  to  flexibility  requirements,  permanently  changing  VE  structures  and  the  required  effort  to 
agree  upon  the  contents  and  mechanisms  of  the  central  database  [5],  A  federalized  approach  allowing  an 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


624 


agreement  on  data  integration,  process  integration,  different  PDM  and  IT  systems  as  well  as  different 
operating  cultures  is  ultimately  required.  This  integration  should  however  not  focus  separately  on  the  data 
level  or  the  process  level,  however  an  integrated  approach  incorporating  both  data  and  process  aspects  is 
required.  Standards  such  as  STEP  (STandard  for  the  Exchange  of  Product  Model  Data  ISO  10303), 
including  the  Application  Protocols  such  as  AP203  or  AP214,  can  provide  a  good  starting  point  for  data 
integration,  but  a  remaining  semantic  gap  i.e.  different  interpretations  and  understanding  of  “common” 
information  objects  and  processes  needs  to  be  bridged  in  a  conceptual  Integration  Layer.  Thus  one  of  the 
main  challenges  is  to  begin  on  the  highest  level  by  using  standardized  and  possibly  re-usable  models  to 
reach  a  customized  VE  specific  Integration  Layer. 

Basic  methods  of  data  integration 

Basically,  two  different  approaches  for  Integration  Layers  can  be  distinguished: 

•  Data  Exchange  Approach :  as  a  result  of  an  asynchronous  exchange  of  files  between  the  PDM  systems, 
the  same  (replicated)  information  is  held  in  the  PDM  systems  of  the  partners. 

•  Data  Sharing  Approach  :  information  is  distributed  among  the  PDM  databases  and  accessible  online 
via  a  middleware  layer. 

To  make  data  transfer  possible,  defined  and  agreed  interfaces  are  needed  through  which  the  sender  and  the 
receiver  can  access  the  information.  A  defined  interface  consists  of  a  defined  notation  (representation 
language)  and  a  defined  logical  data  model.  In  general,  there  are  two  ways  to  achieve  such  an  interface  for 
an  Integration  Layer: 

•  Creating  a  new  interface  definition  from  scratch 

•  Adapting  a  standardized  interface  definition 

In  order  to  save  resources  when  introducing  an  Integration  Layer  in  a  VE,  standardized  interface 
definitions  should  be  used. 

State  of  the  art  approaches  for  standardized  interface  definitions 

The  STEP  Standard  (ISO  10303)  established  in  1985  was  the  first  standard  with  an  integrated  product 
model  approach  which  provided  semantic  data  models  (defined  in  Application  Protocols  such  as  AP214 
and  AP203)  and  mechanisms  for  PDM  data 
exchange.  The  formal  data  specification 
language  EXPRESS  is  also  standardized  in 
Part  11  of  ISO  10303  [6], 


The  file-based  exchange  of  STEP  is  based  on 
Processors’  which  transform  the  data  from 
the  PDM  systems  into  files  with  a 
standardized  format  and  a  standardized  data- 
model  (Part  21  ISO  10303)  (see  Fig.l).  The 
STEP  standard  also  provides  mechanisms  for 
data  sharing  via  the  Standardized  Data 
Access  Interface  (SDAI)  [8],  Another 
approach  to  the  data  sharing  concept  is  based 
on  the  Common  Object  Request  Broker 
Architecture  (CORBA)  proposed  by  the 
Object  Management  Group  (OMG).  The  scope  of  this  Integration  Layer  is  to  access  the  services  of  PDM 
systems  of  partner  companies  by  sending  and  receiving  requests  to  object  implementations  of  the  PDM 
systems  via  the  common  object  broker  layer  (ORB  layer)  [9], 

Both  the  proposal  of  the  PDM  Enabler  group  in  response  to  a  request  of  the  OMG,  and  the  results  of  the 
RISESTEP-project,  provide  object  models  for  the  CORBA-based  common  use  of  PDM  services  (as  well  as 
data  sharing)  between  different  PDM  systems  [9]  [10], 


Fig.l.  A  STEP  based  Integration  Layer 


625 


HARMONIZING  DATA  AND  PROCESSES 

Customizing  a  PDM  STEP-Processor 

Our  experience  in  projects  aiming  for  PDM-based  collaboration  in  VEs  is  that  the  agreement  to  use  PDM 
systems  and  a  certain  Integration  Layer  based  on  a  standardized  data  model  such  as  STEP  is  a  necessary 
decision  in  order  to  achieve  standards-based  data  integration. 

However  this  does  not  solve  all  problems.  Due  to  different  business  processes,  companies  customize  their 
PDM  systems  differently  [3],  In  practice,  this 
means  that  objects  in  the  data  model  of  the 
PDM  systems  are  modified,  new  objects  with 
new  behavior  are  created  and/or  objects  are 
interpreted  in  company-specific  ways. 

The  semantic  of  an  information-object  can  be 
defined  as  the  description  of  the  intended 
meaning  of  it. 

Whereas  the  definitions  of  semantics  and 
attributes  of  elements  such  as  points  and  lines 
are  used  almost  universally,  and  the  definition 
of  elements  such  as  surfaces  and  solids  is 
uniform  in  most  CAD  systems,  the  definitions 
of  PDM  objects  are  company-specific. 

Whereas  it  is  possible  to  agree  on  a  certain  data  model  for  CAD  data  exchange  and  then  buy,  install  and  use 
an  appropriate  STEP-Processor  between  CAD  systems,  this  approach  will  often  not  work  for  PDM.  This  is 
due  to  the  mapping  rules  of  the  Integration  Layer,  which  transform  information  from  the  standardized 
neutral  data  model  that  is  exchanged,  or  shared,  to  the  specific  data  model  of  the  PDM  system.  Since  the 
PDM  system  is  customized  these  rules  must  also  be  modified  according  to  the  customization  of  the  PDM 
system  (see  Fig.2).  In  other  words,  a  customizable  Integration  Layer  is  required. 

“A  Semantic  mapping’which  makes  possible  bilateral  mapping  rules,  is  required....”  [1 1]. 

So,  every  partner  in  the  VE  must  define  their  mapping  rules  for  their  customized  PDM,  and  all  partners 
must  map  the  same  information  to  the  same  attributes  of  entities  in  the  data  model  shared  or  exchanged. 

An  example  of  a  semantic  gap 

Experiences  within  projects  with  DaimlerChrysler  Aerospace  showed  that  the  independent  definition  of 
mapping  rules  by  partner  companies  in  a  VE  leads  to  different  results.  The  different  interpretations  of  the 
neutral  data  model  to  be  exchanged  or  shared  led  to  wrong  or  senseless  populations  of  the  PDM  systems’ 
databases  and/or  errors  in  the  Integration  Layer.  Different  interpretations  of  exchange  entities  were  often 
not  found  until  the  first  test-exchange  took  place.  This  led  to  additional  effort  in  the  design  and 
development  of  the  Integration  Layer  and  can  cause  project  delay. 

Figure  3  shows  an  example  of  how  company-specific  interpretations  of  the  agreed  data  exchange  model  can 
give  rise  to  collisions  when  mapping  takes  place. 

Equivalent  information  objects  of  the  two  companies  A  and  B  in  the  VE,  such  as  a  Change  Request  which 
identifies  a  formal  proposal  for  changing  the  product  and  causes  the  engineering  action  to  achieve  the 
change,  may  be  mapped  on  different  entities  in  AP  214  of  STEP.  The  mapping  of  a  Change  Request’ from 
company  A  on  the  STEP  AP214  entity  Activity’ would  be  valid,  as  would  be  the  mapping  on  the  AP214 
entity  ‘work_request’,  although  the  definitions  of  the  two  entities  are  different. 

An  Activity’  is  described  as  h  fact  of  achieving  or  accomplishing  an  action’  whereas  a  ‘  work  request’  is 
described  as  the  Solicitation  for  some  type  of  work  to  be  done’[12]  (see  Fig.3.). 


Fig.2.  Collision  of  a  standard  STEP-Processor  and  a 
customized  PDM  system 


626 


The  different  interpretation  of  entities  of  the 
standardized  data  model  is  called  a  semantic 
gap  between  the  partners.  The  reason  for  this 
problem  is  a  lack  of  semantic  precision  of 
entities  of  data  models  in  STEP.  Concepts 
such  as  jwf  and  hction’  are  not  defined 
with  enough  rigor  to  be  shared  by  different 
application  protocols.  Different  interpretations 
are  assumed  for  the  same  term  in  different 
places  [13]  [14].  Clearly  then,  there  must  be  a 
common  understanding  of  the  semantics  of 
the  neutral  data  model  to  be  exchanged  or 
shared  between  the  partner  companies.  This  is 
the  basis  for  a  correct  mapping  on  the  specific 
PDM  data  models  before  any  Integration 
Layer  should  be  customized  or  implemented. 

As  described  above,  so  are  the  semantics  of  the  information  objects  dependant  of  the  process  definition 
steps  in  which  they  are  created.  Therefore,  the  mentioned  trigger  events  for  synchronizing  the  production 
information  between  the  partners,  needs  to  be  harmonized.  Excessive  harmonization  of  the  processes 
between  the  partner  companies  may  lead  to  an  unacceptable  workload,  as  a  result  of  the  specific 
engagement  of  the  companies  with  other  VEi  and  their  historical  backgrounds.  Therefore  an 
understanding  of  the  process  steps  on  a  general  basis  between  the  partners  must  be  reached.  Problems  in 
achieving  a  common  understanding  on  the  process  level  are  also  included  under  the  term  “ semantic  gap”. 
The  complexity  of  the  harmonization  between  the  partners  increases  because  data  and  process 
harmonization^  need  to  be  specified  in  an  integrated  way. 

The  next  question  is,  how  do  companies  in  a  VE  achieve  a  common  understanding  and  agreement  on 
objects  which  are  often  still  under  definition  internally? 

Experience  in  the  European  aircraft  project,  in  which  DaimlerChiysler  Aerospace  and  itm  are  both 
participants,  showed  that  the  simple  approach,  in  which  each  company  sends  the  other  a  list  with  attributes 
that  they  intend  to  use  for  exchange,  was  not  successful.  As  a  result,  a  more  systematic  approach  of 
achieving  a  common  understanding  of  the  way  the  VE  partners  will  use  the  data  model  to  be  exchanged  or 
shared  is  required. 


DatarrrcH 

GcrrpoyA 


Change 
Rarest  II 

m 

-request  type: 
'CR  Kj  FFA.” 

1 

„  Roposaland 
workotderfbr 
changing  the 
product  “ 


„.  sdidtationfor 
sorrewoik.." 


„  Reposal  fa 
changing  the 
product" 


Fig.3.  Example  of  a  semantic  gap 


OBJECT-ORIENTED  REQUIREMENT  ENGINEERING 

To  reduce  the  semantic  gap  between  the  partners,  a  systematic  Requirement  Engineering  (RE)  should  be 
carried  out  to  achieve  a  common  understanding  of  certain  requirements  for  information  content. 

According  to  Pohl  [15]  RE  can  be  defined  as  h  systematic  process  of  developing  requirements  through  an 
iterative  co-operative  process  of  analyzing  the  problem,  documenting  the  resulting  observations  in  a  variety 
of  representation  formats  and  checking  the  accuracy  of  the  understanding  gained’.  A  four-step  generic 
approach  achieves  the  goals  of  RE: 

•  Requirement  Elicitation :  Find  out  the  requirements  for  the  problem  to  be  solved 

•  Requirement  Negotiation :  Discuss  and  agree  the  requirements  identified  in  the  elicitation  step 

•  Requirement  Specification:  Derive  a  formal  specification  out  of  the  requirements  identified  and 

negotiated 

•  Requirement  Validation :  Certify  that  the  specified  requirements  are  consistent  with  initial 

intentions 


The  four  RE  steps  can  be  applied  to  the  specific  problem  to  define  a  VE  common  Integration  Layer. 


627 


Common  Requirement  Elicitation 

After  selection  of  the  appropriate  integration  approach  (data-exchange  or  data-sharing)  and  the  agreement 
on  a  certain  standardized  data  model,  the  main  requirement  elicitation  process  for  PDM  data  exchange  can 
start  with  the  aim  of  achieving  a  common  understanding  of  the  data  model.  Common  materialization  rules 
for  the  neutral  data  model  should  be  defined.  The  information  objects  identified  and  managed  in  the  PDM 
systems  of  each  partner  must  be  mirrored  on  suitable  objects  in  the  standardized  data  exchange  model. 

To  avoid  a  semantic  gap ,  the  information  objects  should  be  identified  together  with  the  partners  from  the 
very  beginning  by  defining  a  VE-appropriate  interpretation  of  the  standardized  neutral  data  model.  The 
requirements  as  to  which  information  should  be  exchanged  can  be  elicited  in  workshops  with  users  of  all 
participating  companies,  or  by  analyzing  the  information  which  was  exchanged  in  other  consortia  or  before 
the  digital  exchange  started  in  this  VE.  This  can  be  done  by  trying  to  identify  and  interpret  appropriate 
objects  of  the  chosen  standardized  data  model.  As  they  are  used  to  manage  the  product  structure  and  the 
related  documents  and  files,  the  Bill  of  Material  (BoM)  and  the  Drawing  Tree  documents  are  a  suitable 
base  for  analyzing  the  information  that  must  be  exchanged.  The  decision,  as  to  whether  a  certain 
information  object  should  be  exchanged  or  not,  can  be  influenced  by  factors  such  as  whether  it  is  necessary 
for  Concurrent/  Simultaneous  Engineering  (CSE),  or  if  the  additional  effort  to  manage  the  exchange  of  this 
object  adds  value  to  the  cross-company  process  chain. 

On  the  other  hand,  trigger  points  can  be  determined  by  analyzing  the  impact  of  certain  information  objects 
on  common  processes.  These  include  change  requests,  work  orders  and  maturity  stages  of  parts  to  be 
manufactured.  A  further  process  analysis  should  define  precisely  what  happens  on  the  occurrence  of  such  a 
triggering  event,  such  as  which  information  should  be  exchanged  on  a  certain  event  and  which  reaction  is 
expected  by  sending  a  certain  information  onto  a  certain  trigger. 

Negotiation  and  Specification 

What  is  the  best  way  to  achieve  agreement  on  how  the  data  model  should  be  interpreted  and  to  negotiate 
how  the  company-specific  interpretations  of  entities  in  the  data  model  can  be  exchanged? 

Our  experience  within  the  DaimlerChrysler  Aerospace  aircraft  project  showed  the  success  of  an  approach 
based  on  exchanging  tables  of  proposals  for  the  semantic  definitions  of  entities  -  with  examples  of  the 
populated  data  model  -  between  the  partner  companies  of  the  Virtual  Enterprise. 

As  we  have  seen,  both  data  and  process  requirements  have  close  interdependencies.  Therefore  the  separated 
modeling  of  data  and  process  information  bears  the  danger  of  inconsistencies  between  the  two 
specifications.  On  the  other  hand  the  object- 
oriented  paradigm  has  proven  its  ability  to  model 
reality  in  a  user-friendly  way.  After  modeling 
preliminary  separate  data  and  process  models 
both  aspects  need  to  be  integrated  within  one 
integrated  object-oriented  model  (see  Fig4.) 

After  the  neutral  data  model  is  defined  and 
agreed,  rules  must  be  drawn  up  to  populate  it.  It  is 
very  important  to  have  an  agreement  on  the  rules 
to  populate  the  data  model  in  order  to  specify  the 
Integration  Layer.  If  some  identifiers  in  STEP 
physical  files  are  missing,  or  have  unexpected 
values,  then  the  Integration  Layer  will  probably 
not  work  properly. 

Mapping  rules  are  based  on  a  defined  set  of  values.  The  basis  for  these  rules  can  only  be  the  vision  of  what 
the  common  product  structure  should  look  like.  Again,  user  workshops  and  the  analysis  of  the  Bill  of 
Materials  will  help  to  find  the  requirements  for  suitable  values  for  the  attributes. 


Fig.4.  Object  Oriented  Requirement  Engineering 


628 


Validation 

As  our  experiences  in  the  European  aircraft  development  consortium  have  revealed,  even  an  integrated  RE 
process  was  not  able  to  cover  all  semantic  problems  and  so  some  of  the  inconsistencies  were  not  recovered 
until  a  final  testing  and  validation  process  took  place.  Test  scenarios  should  be  defined  precisely  with  the 
intention  of  testing  all  the  information  content  to  be  exchanged  or  shared.  After  the  functionality  that  is 
relevant  for  data  exchange  in  each  of  the  individual  PDM  systems  has  been  tested,  a  cross-company  test 
should  be  performed. 

CONCLUSION 

PDM  systems  with  their  ability  to  manage  complex  product  related  data  in  distributed  CSE  processes 
increasingly  built  up  the  IT  backbone  of  Virtual  Enterprises  in  the  realm  of  Global  Engineering  and 
Manufacturing.  On  the  basis  of  a  federated  integration  approach,  the  remaining  semantic  gap  resulting  from 
different  interpretations  and  understandings  of  product  and  process  relevant  information  objects,  has  to  be 
balanced  in  a  conceptual  Integration  Layer.  The  itm  approach  as  a  result  of  its  experience  in  a  european 
aircraft  development  project  allows,  on  the  basis  of  the  object  oriented  paradigm  and  under  consideration  of 
existing  standards  a  systematic  and  evolutionary  requirement  engineering  process.  This  approach  leads  to  a 
consistent  and  validated  specification  of  the  Integration  Layer,  a  prerequisite  for  a  PDM  based  VE  in  which 
different  interpretations  and  understanding  of  “common”  information  objects  and  processes  are  alleviated. 

REFERENCES 

1.  Krause,  F.-L.;  Kind,  Chr.,  1996.  "Potentials  of  information  technology  for  life-cycle-oriented  product 
and  process  development"  in:  Krause  F.-L.;  Jansen,  H.  :  Life  Cycle  Modeling  for  Innovative  Products 
and  Processes:  Chapmann  &  Hall,  London. 

2.  Eversheim,  W.,  Schuth,  S.,  Bremer,  C.,  Molina,  A.,  1998.  "Globale  virtuelle  Untemehmen";  ZWF  3/98, 
93.  Jahrgang;  Carl  Hanser  Verlag. 

3.  Spur,  G.;  Krause  F.-L.,  1997.  "Das  Virtuelle  Produkt  -  Management  der  CAD  Technik";  Carl  Hanser 
Verlag  Munchen  Wien. 

4.  Dangelmeier,  W.,  1996.  "PPS  in  der  virtuellen  Fabrik";  proceedings  of  ‘Fortgeschrittene 
Informationstechnologie  in  der  Produktentwicklung  und  Fertigung’2.  Internationales  Heinz Nixdorf 
Symposium  fur  industrielle  Informationstechnologie  am  20/21  HNI-Verlagsschriftenreihe,  Bd.19 
(Rechnerintegrierte  Produktion/  Wirtschaftsinformatik). 

5.  Friedmann,  T;  Jungfermann,  W;  Schmid,  C.,  1998.  "Global  Engineering  -  Welchen  Beitrag  leisten 
EDM-Systeme  und  das  Internet?";  EDM-ReportNr.4/1 998,  Dressier  Verlag  GmbH,  Heidelberg. 

6.  N.N.:  ISO/10303-1 1,  "Industrial  Automation  and  Integration  -Product  Data  Representation  and 
Exchange  -  Parti  1";  The  EXPRESS  language  reference  manual 

7.  IS010303-22  (committee  draft),  1994.  "Industrial  automation  systems  and  integration  -  Product  data 
representation  and  exchange";  Part22:  Standard  data  access  interface  specification. 

8.  Object  Management  Group  (OMG),  1998.  "The  Common  Object  Request  Broker:  Architecture  and 
Specification";  Revision  2.2. 

9.  N.N.,  1997.  "Product  Data  Management  Enablers  Proposal  to  the  OMG  in  Response  to  OMG 
Manufacturing  Domain  Task  force  RFP1";  Initial  Submission,  OMG  Document  mfg/97-04-01  , 
Metaphase  Technology,  Inc. 

10.  Behrens,  H.;  Lotter,  N.;  Machner,  B.,  1998.  "PDM  integration  using  the  CORBA  and  STEP  standards"; 
Proceedings  of  the  ProSTEP  Schience  Days  '98,  Wuppertal. 

1  l.Eigner,  M.;  Zagel  M.,  1997.  "STEP-Compliant  PDM  Solutions";  Proceedings  of  the  STEP  Forum  '97- 
Development  partnerships  require  compatible  product  data,  17.4.97,  Munich,  ProStep  GmbH. 

12. Hemmelmann,  A.,  1997.  "Entwicklung  eines  Datenmodells  zum  Austausch  von  Produktdaten  in  der 
Europaischen  Luftfahrtindustrie";  Diploma  thesis  carried  out  at  the  Institute  of  Information  Technology 
in  Mechanical  Engineering(itm),  TU  Munchen. 

13. Guarino,N.;  Borgo,  S.;  Masolo,  C.,  1997.  "Logical  modelling  of  product  knowledge:  towards  a  well- 
founded  semantics  of  STEP",  Proc.  Euro.  PDT  Days  '97,  Sophia-Antipolis,  PDTAG-AM,  ESPRIT  9049. 

14. Nowacki,  H.,  1998.  "Scientific  issues  in  product  data  technology";  Keynote  Address  at  theProSTEP 
Science  Days  '98,  Wuppertal. 

15. Pohl,  K.,  1996.  "Process-centered  requirements  engineering";  Research  Studies  Press  Ltd.Taunton, 
Somerset,  England. 


629 


A  Methodology  to  Diagnose  the  Target  Cost  in 
a  Manufacturing  Process 

A.Arioti,  C.  Fantozzi,  M.Granchi,  E.Vettori 

Department  of  Mechanical,  Nuclear  and  Production  Engineering 
University  of  Pisa,  Pisa,  Italy 


ABSTRACT 

Techniques  of  Target  Cost  Management  have  been  studied  to  help  companies  identify  the  causes  of  rejected 
parts  because  of  incorrect  design  choices.  A  part  of  the  problem  can  be  removed  in  the  design  phase  by 
developing  a  model  to  estimate  costs.  This  provides  the  designer  with  a  tool  to  carry  out  cost  estimations 
and  make  correct  choices.  The  model  also  permits  the  determination  of  the  estimated  cost  of  a  new 
component  in  a  direct  and  rapid  way  belonging  to  one  of  a  family  of  parts  well-understood.  This  study  was 
made  for  an  important  Italian  company. 


INTRODUCTION 

Target  Costing  is  essentially  a  multifunctional  activity  which  employs  interdisciplinary  groups  made  up  of 
professionals  in  charge  of  various  company  functions  involved  in  the  process  of  planning  manufacturing 
and  commercialisation  of  a  specific  product.  Its  aim  is  to  identify  the  cost  causes  and  to  transfer  the 
knowledge  and  atudy  the  problems  typical  of  the  production  line  in  the  product  planning  phase  in  order  to 
remove  the  cause  of  anomalous  costs  even  before  the  beginning  of  production. 

Target  Costing  was  developed  in  Japan  by  companies  such  as  NEC,  SONY,  NISSAN,  and  above  all, 
TOYOTA  as  an  instrument  to  plan  costs  and  combine  the  instruments  of  Cost  Management  and  Cost 
Engineering. 

By  applying  the  methodology  of  Target  Costing  during  the  planning  phase,  it  has  been  demonstrated  that 
among  the  economic  characteristics  of  a  product,  i.e.,  selling  price,  industrial  production  cost  and  company 
profit  correlated  by  the  equation:  price  -  cost  =  profit,  selling  price  is  imposed  by  the  market,  profit  is 
defined  by  the  financial  reality  of  the  company,  production  cost  is  the  only  variable  which  is  possible  to 
manipulate  so  that  the  true  production  cost  will  be  less  than  the  one  deduced  from  the  equation:  cost  = 
price  -  profit. 

Estimation  of  costs  by  traditional  methods  first  needs  to  characterise  the  product  according  to  the  functions 
required  and  these  are  determined  according  to  the  components  needed.  Obviously  it  is  better  to  rely  on 
cost  data  of  the  components  used  with  similar  products  that  already  exist. 

Such  an  approach  does  not  permit  assignment  of  a  cost  to  the  single  phase  which  characterises  production 
of  a  general  component.  Such  a  limitation  is  the  principal  cause  preventing  an  effective  critical  analysis  of 
the  planning  choices  on  single  components. 

Moreover  there  is  a  risk  to  attribute  rules  to  a  component's  indirect  costs  which  are  not  relevant.  The 
determination  of  the  real  cost  is  final  and  therefore  a  possible  reduction  generally  requesting  some  changes 
in  the  planning  or  manufacturing  cycle  in  the  end  may  be  too  expensive. 

If  we  refer  to  the  methodology  of  Target  Costing,  the  work  is  developed  according  to  the  following  phases: 

1 .  Transferring  knowledge  and  manufacturing  line  problem  uderstanding  to  the  planning  phase, 

2.  Identifying  the  expense  threshold  of  the  various  aspects  which  characterise  the  process  in  order  to 
make  the  planner  responsible  for  choices  within  his  competence, 

3.  Drawing  up  the  "Rules  of  Design  for  Manufacturing"  to  apply  in  the  planning  of  a  new  component, 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


630 


4.  Developing  a  software  system  to  enable  the  planner  to  use  the  numerical  data  contained  in  the  graphs 
and  tables  on-line  without  having  to  write  down  the  necessary  operations  on  paper, 

5.  Developing  a  suitable  model  for  the  cost-estimate  components  to  give  the  planner  an  instrument  by 
which  to  carry  out  an  economic  analytical  evaluation  of  the  choices  within  his  competence  and,  at  the 
same  time,  to  give  management  an  accounting  instrument  by  which  it  is  possible  to  obtain  more 
precise  data  in  a  very  short  time. 


Fig.  1.  Traditional  approach  for  costing. 


APPLICATION  OF  THE  TARGET  COST  IN  THE  PLANNING  AREA 

The  study  has  been  developed  in  such  an  area  and  needs  an  interdisciplinaiy  group  in  which  each  member 
follows  the  development  of  a  typology  of  components  which  constitute  the  product. 

Among  the  members  of  the  interdisciplinary  groups  the  planner  can  have  a  great  influence  on  the  product 
cost.  He  must  take  into  account  the  economic  characteristics  of  the  project  choices  and  of  the  ‘time  to 
market”  to  reduce  the  time  between  the  product  planning  and  its  entering  the  market.  The  planner  therefore 
must  have  at  his  disposal: 

1.  Exhaustive  information  about  the  product  materials  and  existing  production  technologies  not 
necessarily  used  by  the  Company, 

2.  "Design  Rules  for  Manufacturing"  are  drawn  up  by  taking  into  account  the  problems  of  the  production, 

3.  Data  processing  support  able  to  optimise  the  solutions  technically  and  economically  so  as  to  improve 
the  quality  of  the  product  without  increasing  its  price. 

It  will  therefore  be  necessary: 

•  to  transfer  the  knowledge  and  fundamental  production  line  problems  to  the  planning  phase, 

•  to  determine  the  economic  threshold  of  the  various  aspects  which  characterise  the  process, 

•  to  draw  up  "Rules  of  Design  for  Manufacturing"  to  use  when  planning  a  new  component, 

•  to  develop  software  which  enables  the  planner  to  use  numeric  data  contained  in  graphs  and  tables  on¬ 
line  so  there  is  no  need  to  write  down  the  required  operations  on  paper. 

•  to  develop  a  cost  estimation  model  to  give  planners  an  economic  and  critical  evaluation  of  the  choices 
within  their  competence. 


631 


iPlanning! 

•  Technical  specifications, 

•  Management  accounting, 

•  drifting  part  cosi; 

•  Detailed  project, 

•  Value  Engineering  - - 


The  planning  phase 
needs  "Rules  of  Design 
^  for  Manufacturing" 

& 

Allowable 

to  carry  out  a  review  of 

part  cost 

the  suitable  choices 

EC 

Objective  data: 

•  Performance, 

•  selling  price, 

•  profit,  L 

•  current  technological 

standard  ^ 


Transfer  of 
knowledge  and 
production  line 
problems  to 
planning 


Allowable 
product  cost 


|  Competitive 
Products 


Customers’ 

requirements 


Target  Cost 

f 

Manu.  Cycle 

TC 

r 

for  part 

— 

■ 

Target 

price 


Target 

profit 


Management  needs  an  "accounting  instrument"  to 
▼  determine  a  sufficiently  reliable  cost  estimation 


Fig.  2.  Target  cost  methodology. 


ASPECTS  OF  THE  STUDY 

The  activities  carried  out,  have  examined  the  causes  for  rejection  and  identified  those  due  to  the  planning 
process.  A  careful  analysis  has  been  carried  out  on  the  rejected  elements  in  order  to  determine  the  planning 
faults  responsible  for  the  rejects..  Planning  rules  have  been  studied  to  take  into  account  the  connections 
within  the  planning  choices. 

We  have  examined  the  die-casting  process  of  aluminium  elements  and  have  studied  and  developed  a 
suitable  software  on  the  basis  of  "Rules  of  Design  for  Manufacturing".  The  aim  has  been  to  give  a  planner 
the  possibility  to  work  on-line  by  immediately  acquiring  the  necessary  numeric  data  without  requiring  long 
calculations  on  paper. 

The  study  has  been  developed  for  an  important  Italian  Company  which  manufactures  scooters  from 
aluminum  alloy  die-cast  parts. 

The  software  has  been  developed  with  the  principal  interface  components  shown  in  Fig.  3.  It  processes  the 
numeric  data  contained  in  graphs  and  schedules  automatically.  The  numeric  values  which  appear  in  the 
software  are  those  recommended  for  correct  planning  of  the  aluminum  alloy  die-cast  parts,  in  some  cases 
processed  to  take  into  account  knowledge  of  company  technicians  in  close  contact  with  the  process. 


632 


Therefore  for  problems  connected  to  correct  design  of  a  die-cast  part,  the  planning  aspects  only  give  the 
input  boxes  in  which  to  insert  the  requested  numeric  values  and  the  material.  Immediately  after,  the  output 
values  for  a  certain  material  appear  automatically  and  give  the  inclinations,  tolerances  etc. 


Inclination 


Thickness 


Tapping 


Parallelism 


Perpendicularity 


Coaxiality 


iO TV-  r- 

information 


SnriuiDC 


Thickness  Tolerances 


i-  ■■  ■ 

Plan  control 


sji  ,  _ 

UNI  6387,.,  |v¥: 


Stiffening  ribs 


Connection  of 
three  faces 


t  cross 


Connection  of 
two  faces 


» 

Conicity  ofhdie 


Connection  of 
four  faces 


Dimensional  &V  | 
tolerances  P 


Fig.  3.  The  principal  menu  of  the  program 


In  order  to  make  a  planner^  task  much  easier,  we  are  developing  a  model  of  cost  estimate  in  Excel  using 
Visual  Basic  programming  starting  from  the  influence  of  various  factors  on  the  cost  of  the  die-cast  part  as 
shown  in  Fig.4. 


CONCLUSIONS 

It  can  be  seen  that  if  we  refer  to  the  equation  cost  =  price  -  profit  by  this  estimation  model,  it  is  possible  to 
choose  new  components  in  the  planning  phase,  introductory  to  mass  production,  thus  intervening  in  the 
project  when  necessary  and  making  changes  aimed  at  controlling  real  production  cost. 

In  the  first  application  of  the  software  with  "Rules  of  Design  for  Manufacturing"  the  following  advantages 
have  been  achieved: 

1 .  considerable  reduction  in  the  time  necessary  to  carry  out  calculations  and  make  choices  while  designing 
the  project, 

2.  considerable  reduction  in  possible  errors  in  numerical  calculations, 


633 


3.  obtaining  numeric  values  on-line  giving  the  possibility  to  plan  based  on  consideration  of  the  economic 
limits  of  the  process, 

4.  standardisation  of  the  technical  references  of  components  involved  in  aluminium  die-casting. 


-  - '  ,  ,-Y,  -  ;  - 

Rejects  incidence 


We  are  carrying  out  a  subsequent  study  of  the  software  to  estimate  cost  and  are  convinced  that  other 
advantages  can  be  reached: 

1 .  A  considerable  reduction  in  the  time  necessary  to  carry  out  an  estimate, 

2.  Attainment  of  the  estimated  costs  which  are  considerably  near  the  real  ones, 

3.  Possibility  of  different  estimates  for  a  new  component  by  simply  changing  the  numeric  inputs,  by  which 
the  planner  can  carry  out  an  economic  and  critical  evaluation  of  the  choices  within  his  competence. 

The  detailed  disassembling  of  all  the  cost  causes  which  characterise  a  product  makes  the  sessions  of  value 
engineering  faster  and  makes  it  possible  to  verify  the  improvements  which  result  from  the  planning 
variations  aiming  at  a  minimum  cost. 


REFERENCES 

1.  Horvath  P.,  1993.  Target  Costing  a  State-Art-Review.  ISF  International  Ltd,  pp.  1-64. 

2.  Tanaka,  Yoshikawa.  Innes,  ed.,  1994.  Contemporary  Cost  Management.  Chapman  &  Hall. 

3.  _ ,  1996.  FIAT  Normative. 

4.  Parsaei,  H.R.,  Sullivan  W.G.,  1993.  Concurrent  Engineering,  Chapman  &  Hall. 

5.  Halevi,  G.,  1993.  The  magic  matrix  as  a  smart  scheduler.  Computers  in  Industry,  21, 245-253. 


634 


635 


Resource  Allocation  Model  for  a  Fast-Tracked  Project 

Yassiah  Bissiri  and  Scott  Dunbar 

Department  of  Mining  and  Mineral  Process  Engineering 
University  of  British  Columbia,  6350  Stores  Road,  Vancouver,  B.C.,  V6T  1Z4,  Canada 
Email:  bissiri@.mining.ubc.ca  wsd@mining.ubc.ca 


ABSTRACT 

The  concept  of  fast-tracking  a  project,  although  generally  economically  beneficial,  is  a  risky  undertaking. 
The  risks  vary  from  being  unable  to  complete  the  project  in  the  expected  time  to  higher  costs  due  to 
excessive  compression  of  the  activity  duration.  This  paper  describes  the  variables  involved  in  fast-tracking 
a  project  and  then  demonstrates  that  risks  can  be  reduced  if  proper  resources  are  carefully  allocated  to  the 
project.  Reducing  the  duration  of  an  activity  ("crashing")  within  a  project  usually  requires  additional 
investment  and/or  resources.  These  resources  can  be  found  within  the  project^  pool  of  funds  such  as  using 
overtime  for  manpower  or  they  are  brought  into  the  project  as  additional  items.  The  success  of  the  fast- 
tracking  approach  depends  on  minimizing  the  cost  of  these  additional  resources.  A  simulation  model  is 
described  that  allocates  resources  to  project  activities  in  a  way  so  as  to  minimize  the  additional  cost  of 
resources.  The  fact  that  the  start  time  of  an  activity  depends  on  the  completion  time  of  its  predecessors 
makes  it  a  probabilistic  problem  with  respect  to  completion  time. 


INTRODUCTION 

The  decision  to  fast-track  a  project  is  very  risky  [1]  and  should  be  made  only  after  serious  analysis  of  the 
parameters  involved  in  the  process.  Fast-tracking  a  project  involves  consumption  of  additional  resources 
that  have  to  be  brought  into  the  project  at  additional  cost.  Minimizing  the  cost  of  fast-tracking  a  project  can 
be  very  decisive  in  reducing  risks.  The  model  discussed  in  this  paper  analyses  the  project  network  and 
assigns  the  proper  "crashing"  time  to  activities  that  can  be  "crashed"  in  order  to  fast-track  the  project. 

When  a  decision  is  made  to  fast  track,  the  critical  activities  are  "crashed"  based  on  their  ’fcrashability"  and 
their  "crash  cost"  because  the  critical  path  defines  the  project  duration.  The  model  is  then  extended  to 
activities  that  are  "near  critical"  relative  to  the  amount  of  time  to  which  the  project  should  be  fast-tracked. 

An  immediate  impact  of  the  risks  on  a  plan  to  fast-track  a  project  is  such  that  although  allocating  the 
necessary  resources  to  activities  is  done,  the  project  may  fail  to  respond  and  simply  adds  more  expenses  to 
the  project  cashflow  [2],  Figure  1  describes  the  process  of  fast-tracking  a  construction  project.  Two 
scenarios  appear  in  the  diagram,  a  traditional  approach  and  a  fast-track  approach.  The  traditional  approach 
consists  of  sequencing  design  and  construction  phases,  whereas  the  fast-  track  approach  consists  of 
overlapping  and  reducing  the  duration  of  the  design  and  construction  phases.  Each  phase  represents  a 
subproject  of  the  entire  project  and  is  formed  by  a  chain  of  activities  such  that  compressing  the  duration  of 
each  sub-project  involves  "crashing"  activities  that  are  critical  to  the  network  representing  these 
subprojects.  The  process  of  "crashing"  an  activity  on  the  network  depends  on  several  factors  such  as: 

•  The  "crashability"  (how  far  can  an  activity  be  crashed?) 

•  The  cost  of  "crashing" 

•  The  type  of  activity 

•  The  availability  of  resources 

The  "crashability"  of  an  activity  is  defined  as  the  maximum  time  to  which  an  activity  can  be  compressed 
and  still  be  economic.  The  cost  of  compressing  an  activity  increases  exponentially  beyond  this  value  [3]. 
Figure  2  depicts  the  concept  of  "crashability"  and  shows  the  typical  cost  behaviour  of  activity  time.  As  an 
activity  is  compressed,  the  cost  decreases,  reaching  a  minimum,  and  then  increases  rapidly  to  an  asymptotic 
value  at  some  minimum  activity  time.  The  minimum  cost  defines  the  normal  cost  of  an  activity. 


636 


Fig.  1.  Model  of  a  fast-  tracked  construction  project 
td  =  xTd,  T*d  =  yTd  and  T*c  =  zTc 
where  x,  y,  z  are  such  that  0<x<l,0<j<l,and0<z<l. 


Figure  2.  "Crashing"  cost  as  a  function  of  time 

The  cost  of  crashing  an  activity  is  generally  based  on  data  collected  from  records  of  past  projects  that  were 
executed  in  the  same  environment  and  also  from  Project  Managers  who  have  expertise  in  this  domain. 
Consider  a  project  that  has  a  total  of  N  critical  paths  (a  project  can  have  more  that  one  critical  path).  The 
expected  total  duration  of  the  project  is  T  and  a  decision  is  made  to  reduce  the  duration  to  T'. 

N  =  number  of  critical  paths 

tnk  =  number  of  critical  activities  on  critical  path  k  (two  different  paths  can  have  common  activities). 

Pathl  =  {au,  a)2...  e„„) 

Path  n  ={anl...  gm„) 

Critical  Path  i 

Path i  ={an)...  gt„„ ) 

dki  =  duration  of  critical  activity  k  on  critical  path  i 

Cu  =  cost  per  day  of  crashing  critical  activity  i  on  critical  path  k 

Pki  =  upper  limit  of  the  number  of  days  critical  activity  i  can  be  crashed  to. 

aki  =  number  of  days  to  ‘brash”  critical  activity  i  on  path  k 

a  =  T-  T'  =  number  of  days  the  project  is  expected  to  be  fast-  tracked 


637 


Cost  of  Crashing  Path  i 

The  total  cost  of  crashing  the  critical  path  i  is  the  sum  of  the  cost  of  crashing  all  critical  activities  on  the 
path: 

mi 

TCi  =  ^^CLki  '  Cki  1  • 

k= 1 

Cost  of  Fast-Tracking  the  Project 

The  total  cost  of  fast  tracking  the  project  is  the  sum  of  the  total  cost  of  the  n  critical  paths  of  the  project: 

n  n  mi 

TC  =  TCi  =  £  X c a,  -  Cki  2'. 

;=1  1=1  *=1 

Equation  2'  represents  the  additional  cost  of  fast  tracking  the  project.  The  objective  of  any  project  planner 
who  considers  fast  tracking  is  to  minimize  the  total  cost  of  using  more  resources  for  the  project.  The 
problem  of  allocating  the  proper  "crashing"  time  becomes  an  optimization  problem. 

OPTIMIZATION  PROBLEM 

n  mi 

tt/W  •  Cki  3'. 

i=I  k=\ 

mi 

y  ou,  =  OC  for  every  i  4'. 

k=l 

(X ki  <  P*/ ,  for  any  i,  k  such  that  1  <i<n  and  1  <k<rm  5'. 

For  each  activity,  the  proper  "crashing"  time  will  determine  how  much  of  what  type  of  resources  is 
necessary  to  complete  the  work.  Construction  managers  usually  keep  track  of  these  data  in  order  to  use 
them  when  needed  in  the  next  project.  So,  when  provided  with  the  optimum  "crashing"  time,  they  can 
derive  the  amount  of  additional  resources  required. 

It  is  important  to  underline  the  fact  that  "crashing"  critical  activities  by  an  amount  determined  by  the 
optimization  problem  may  not  be  achieved,  creating  risks  related  to  uncertainties.  A  probabilistic  approach 
must  therefore  be  introduced  in  order  to  reflect  reality.  Once  again,  the  probability  distribution  of  the 
success  of  "crashing"  activities  are  built-in  based  on  past  projects  having  some  common  properties  in  the 
environment  in  which  they  were  completed. 

Each  critical  activity  will  then  have  a  set  of  possible  crashing  times  to  which  is  assigned  a  probability 
distribution.  These  probabilities  will  then  be  simulated  to  generate  more  data  in  order  to  conduct  a 
statistical  analysis.  The  fast-tracked  project  completion  time  becomes  probabilistic.  An  example  will 
illustrate  the  technique. 

A  "near  critical"  path  in  our  model  is  one  that  is  not  critical  but  has  its  total  duration  satisfying  the 
following  condition: 

If  T*  is  the  "near  critical"  path  length,  T  is  the  initial  project  length  and  T'  is  the  desired  fast-tracked  project 
length,  then: 

T'<T*<T  6'. 

Compressing  the  project  length  from  T  to  T'  will  automatically  make  the  "near  critical"  paths  critical 
activities  after  the  time  compression. 

N*  =  number  of  near  critical  paths 

mk*  =  number  of  activities  on  a  "near  critical"  path  k*  (two  different  paths  can  have  common  activities). 
Pathl  =  {a*n,  a*i2...  afml ) 

Path  n*  ={a*ni...  a*nm„) 


638 


"Near  Critical"  Path  j 

Path j  ={a*n,...  a*n „„ ) 

d*kj  =  duration  of  "near  critical"  activity  k  on  critical  path  j 

c*kj  =  cost  per  day  of  crashing  "near  critical"  activity  j  on  critical  path  k 

P*kj  =  upper  limit  of  the  number  of  days  to  which  that  a  "near  critical"  activity  j  can  be  crashed. 

a*kj  =  number  of  day  to  crash  "near  critical"  activity  j 

a*j  =  Tj*-  T’ where  T  *j  =  length  of  "near  critical"  path  j. 

The  total  cost  of  fast  tracking  the  near  critical  paths  is 

n*  m*p 

TC*=XX(a  jp*‘c'p*) 

p= i  /=i 

A  "near  critical"  path  is  one  whose  duration  lies  between  the  fast-track  time  and  the  expected  total  duration, 
i.e.,  T1  <  T*  <  T.  Applying  Equation  6'.  to  the  model,  the  optimization  problem  becomes: 


n  mi  n*  m*p 

Minimize 

X  X  (aki  ‘  Cfa)  +  ^lu  (aip  *  'Clp*) 

/=!  *=1  p= 1  1=1 

1. 

Subject  to: 

mi 

^5 \oLki  =  CL  ,  for  any  i 
k= 1 

2. 

* 

nip 

^ cuP *  =  a P* 

i= i 

a  =  T-  T' 

3. 

4. 

0 lr*  =  TP*-T 

5. 

a ki  <  pfa  ,  for  any  i,  k  such  that  1  <  i  <  n  and  1  <k  <rm 

6. 

(Xip*  <  P//>  * ,  for  any  1,  p  such  that  1  <l<n  and  1  <  p  <  mi 

7. 

T'  <  Tp*  <  T 

8. 

APPLICATION 


2 

14 

11 

21 

Fig.  3.  Arrow  on  Arrow  network  of  example  discussed. 


Figure  3  is  the  scheduled  network  of  a  construction  project  composed  of  eleven  activities.  The  numbers  in 
the  squares  represent  the  earliest  and  latest  times  for  each  activity  determined  by  the  backward  and  forward 
pass  methods  respectively.  The  numbers  next  to  the  arrows  are  the  estimated  duration  of  each  activity.  The 
critical  path  of  the  network  (bolded  arrows  in  Figure3)  is  composed  of  activities  1-4,  4-8  and  8-9  and  has  a 


639 


length  of  30  days,  which  represents  the  total  duration  of  the  project.  Interruption  of  an  activity  after  having 
begun  the  activity  is  not  considered  here,  although  in  practice,  an  activity  may  be  interrupted  by,  for 
example,  severe  weather  or  other  unpredictable  factors. 


It  was  decided  that  the  project  would  be  fast-tracked  to  reduce  its  total  duration  from  30  days  to  22  days. 
By  doing  so,  the  entire  network  must  be  recalculated.  The  initial  critical  path  will  still  be  critical,  but  path 
1-4-5-6-9  with  an  initial  length  of  29  days  becomes  automatically  critical  for  the  new  network.  The  new 
network  will  have  two  critical  paths  after  the  "crash".  Table  1  is  an  example  of  a  probability  distribution 
assigned  to  the  completion  time  of  activity  1-2.  Initially,  the  activity  was  to  be  completed  in  4  days  but 
based  on  an  analysis  of  the  uncertainties,  the  completion  time  will  now  lie  between  2  and  5  days.  These 
numbers  are  usually  determined  from  past  project  performance. 


Table  1.  Probability  distribution  of  activity  1-2. 


|  Activity  1-2  | 

Cumulative  Probability 

Time 

0.00 

2 

0.10 

3 

0.35 

4 

0.70 

5 

Probability  distributions  are  also  determined  for  all  other  activities  on  the  network  and  the  notion  of  critical 
and  "near  critical"  paths  then  becomes  probabilistic.  Usually,  there  is  not  enough  data  to  conduct  a  robust 
analysis  and  in  order  to  support  a  decision,  the  data  (like  those  presented  in  Table  1)  are  generated  by 
simulation  to  provide  sufficient  data  to  conduct  a  statistical  analysis.  The  simulation  results  are  in  fact 
different  scenarios  into  which  the  project  might  fall  at  a  certain  time  relative  to  the  uncertainties.  In  Table  2 
the  results  are  displayed  for  the  simulation  performed  on  the  example  network  using  Microsoft  Excel.  The 
numbers  0  and  1  are  used  to  express  affirmation  (yes)  or  negation  (no)  about  the  criticality  and  the  "near 
criticality"  of  the  activities.  The  algorithm  also  ensures  that  two  different  paths  cannot  be  critical  and  "near 
critical"  at  the  same  time.  However,  an  activity  can  be  critical  and  "near  critical"  for  the  case  where  two 
different  paths  share  the  same  activity  such  as  the  case  for  paths  1 -4-8-9  and  1-4-5-6-9  sharing  activity  1-4. 
For  simplicity,  the  following  path  names  are  used: 

Path  #  1  is  composed  of  the  chain  of  activities  1-2-3-6-9  while  Path  #  2  is  composed  of  the  chain  of 
activities  1  -4-5-6-9,  Path  #  3  is  composed  of  the  chain  of  activities  1  -4-8-9  and  Path  #  4  is  composed  of  the 
chain  of  activities  1 -7-8-9. 


Table  2:  Results  of  the  simulated  network 


i  Simulated 

Near 

Activity 

■■SSEHI 

Critical? 

1-2 

2 

0 

0 

1-4 

5 

1 

0 

1-7 

3 

0 

0 

2-3 

9 

0 

0 

3-6 

3 

0 

0 

4-5 

10 

0 

0 

4-8 

15 

1 

0 

5-6 

8 

0 

0 

6-9 

6 

0 

0 

7-8 

6 

0 

0 

8-9 

10 

1 

0 

1 

Total 

1 

1  3 

HDB 

|  Path  Information  I 

■zm 

BEEHEMI 

Critical? 

Near  Critical 

i 

20 

0 

0 

2 

29 

0 

1 

3 

30 

] 

0 

4 

19 

0 

0 

30  | 


M  eet  deadline  ? 


The  results  of  the  simulation  indicate  that  Path  #  2  is  "near  critical"  and  Path  #  3  is  critical  and  that  the 
activities  to  be  "crashed"  lie  on  these  paths.  In  reality,  the  simulation  is  run  at  least  50  times  and  the  moving 
average  of  all  these  results  is  considered.  Solving  the  optimization  problem  with  EXCEL  using  "Solver", 
we  obtained  the  results  shown  in  Table  3. 


640 


Table  3.  Solution  to  the  optimization  problem. 


Path 

Critical 

Near 

Critical 

Crash 

Activities 

crashed 

Crashed 
cost  per  day 

Upper 

limit 

1 

No 

No 

0 

None 

2 

No 

Yes 

7  days 

$98 

$187 

3  days  j 

$140 

2  days 

3 

Yes 

No 

8  days 

$98 

3  days 

$70 

5  days 

4 

No 

No 

0 

None 

Objective 
j  function 

S  1592 

The  algorithm  recognizes  shared  activities,  as  in  the  case  of  activity  1-4  which  is  shared  by  Path  #2  and 
Path  #3.  The  objective  function  represents  the  total  additional  cost  of  to  compress  the  network.  The  results 
in  Table  3  suggest  that  by  compressing  activities  1-4,  4-5,  6-9  and  4-8  by  3  days,  2  days,  2  days,  and  5  days 
respectively,  the  project  length  can  be  reduced  from  30  days  to  22  days  Figure  4  is  the  new  Arrow  on 
Arrow  network  of  the  "crashed"  project.  The  dashed  line  arrows  represent  activities  that  have  been 
"crashed"  as  a  result  of  the  solution  to  the  optimization  problem.  The  bold  and  dashed  arrows  represent  the 
critical  path(s)  of  the  new  network. 


Fig.  4.  Arrow  on  Arrow  network  of  the  "crashed"  project. 


CONCLUSION 

Although  the  model  works  in  theory,  it  is  necessary  to  first  apply  different  models  of  resource-leveling  to 
the  project  before  proceeding.  Our  model  does  not  solve  resource-conflict  problems  but  could  probably  be 
inserted  into  resource-leveling  models  in  future  research.  The  model  has  the  following  advantages: 

•  It  identifies  and  analyses  activities  that  are  critical  or  "near  critical"; 

•  Based  on  the  upper  limits  (maximum  amount  of  time  an  activity  can  be  crashed),  it  determines  the 
number  of  days  to  crash  an  activity  at  minimum  cost  to  the  overall  project. 

•  The  model  includes  a  probabilistic  analysis  approach  related  to  the  uncertainty  of  estimators  and 
provided  by  the  application  of  the  simulation. 

ACKNOWLEDGEMENT 

Special  thanks  to  Dr  John  Meech  for  his  constructive  criticism  during  the  elaboration  of  this  paper.  His 
remarks  have  been  very  useful  in  building  this  model. 

REFERENCE: 

1.  Sproule,  J.A.,  1992.  M.A.Sc.  Thesis,  University  of  British  Columbia,  Civil  Engineering. 

2.  Nahmias,  S.,  1997.  Production  and  Operations  Analysis.  3rd  Edition.  McGraw-  Hill  Publishing  Co. 

3.  Burman  P.J.,  1972.  Precedence  networks  for  project  planning  and  control.  McGraw-Hill  (UK),  London. 


641 


Hybrid  Simulation  Objects  using  Fuzzy  Set  Theory  for 
Simulation  of  Innovative  Process  Chains 

T.  Menzel,  M.  Geiger 

Chair  of  Manufacturing  Technology,  University  ofErlangen-Nuremberg, 
Egerlandstr.  11,  91058  Erlangen,  Germany 


ABSTRACT 

The  purpose  of  introducing  new  manufacturing  process  chains,  such  as  innovative  sheet  metal  processing 
technologies,  is  to  increase  production  efficiency.  Their  success  can  be  quantified  by  assessing  the  expected 
reduction  of  manufacturing  time  or  costs  and  the  increase  in  flexibility.  Nevertheless,  the  implementation  of 
new  process  chains  is  usually  inhibited  by  the  financial  effort  and  the  lack  of  exact  knowledge  about  their 
real  effect  on  production  efficiency.  Furthermore,  the  economic  efficiency  is  mainly  dependent  on  the 
vaguely  known  quantity  and  variety  of  products  which  have  been  planned  for  future  production.  Therefore, 
a  hybrid  system  using  dynamic  simulation  objects  and  fuzzy  set  theory  is  designed  to  assess  its  efficiency 
and  flexibility  in  advance.  Its  design  and  application  will  be  explained  by  the  example  of  the  hydroforming 
process  chain  of  sheet  metal  pairs,  an  innovative  sheet  metal  forming  technology. 

INTRODUCTION 

In  general,  the  introduction  of  innovations  into  industry  involves  the  implementation  or  substitution  of 
products  or  manufacturing  techniques  [1],  This  paper  deals  with  process  innovations  in  sheet  metal 
processing,  which  are  characterized  by  the  new  application  of  manufacturing  techniques  for  the  manufacture 
of  products  or  product  groups.  Their  implementation  intends  to  improve  the  flexibility  and  the  efficiency  of 
production  (manufacturing  costs  and  time).  These  goals  emphasize  the  economic  importance  of  applying 
new  techniques  in  ensuring  the  success  of  an  enterprise  in  the  long  term.  However,  their  implementation 
into  industrial  production  is  usually  inhibited  by  the  financial  effort  and  the  lack  of  precise  information 
about  their  capabilities.  Moreover,  the  efficiency  depends  on  the  dynamic  market  demand  for  workpiece 
variants  and  their  quantities,  which  are  not  exactly  known.  Therefore,  an  important  task  of  intelligent 
production  planning  is  to  predict  the  economic  success  of  applying  new  manufacturing  processes  in 
advance. 

In  recent  years,  simulation  methods  have  become  increasingly  important  in  the  field  of  production  planning 
(see  [2]).  The  common  applications  focus  primarily  on  simulations  for  short  term  planning.  They  are 
applied  to  optimize  the  sequencing  of  products  or  the  layout  of  production  lines.  These  methods,  however, 
do  not  evaluate  the  economic  efficiency  of  applied  manufacturing  techniques.  In  order  to  assess 
innovations  in  technology,  other  approaches  aim  at  assessing  technology  affected  attributes  without 
simulating  the  innovative  production  process  (e.g.  [3]).  Therefore,  the  causes  and  recommendations  for  the 
implementation  can  not  be  recognized  by  the  model  of  the  intended  process  chain.  In  contrast,  the  described 
system  simulates  the  innovative  production  process  taking  into  account  that: 

•  The  criteria  for  success  might  be  contrary  to  one  another  (e.g.  the  flexibility  and  the  costs), 

•  the  manufacturing  process,  the  products’  quantity  and  variety,  and  the  economic  goals  interact  by 
complex  dependencies, 

•  the  exact  quantification  of  technology  affected  attributes  is  hampered  by  the  lack  of  exact  process 
knowledge. 

CASE  STUDY:  HYDROFORMING  OF  SHEET  METAL  PAIRS 

The  process  chain  of  hydroforming  of  sheet  metal  pairs  is  taken  as  an  example.  The  process  chain 
consisting  of  the  process  steps  hydro-preforming,  trimming,  welding  and  hydro-calibrating  aims  at 
manufacturing  metallic  hollow  structures  (Fig.  1).  It  can  be  partially  substituted  for  the  convertiona!  process 
chain  (consisting  of  the  deep  drawing  of  the  upper  and  the  lower  shell  in  multiple  stages  and  the  welding  of 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


642 


them;  [4]).  A  reduction  in  manufacturing  time  and  workcenter  costs  is  expected  due  to  fewer  process  steps 
and  less  handling  and  change-over  time  between  the  multiple  stages  of  the  deep  drawing  process.  In 
contrast,  the  process  chain  causes  higher  workcenter  cost  rates  because  of  the  investment  costs  for  the 
complex  forming  tool,  the  hydraulic  device  and  the  docking  system  (see  [4]).  Furthermore,  the  flexibility 
might  be  reduced  because  of  the  expensive  specialized  forming  tool.  The  assessment  of  the  economic 
efficiency  of  the  hydroforming  process  chain  has  to  consider  the  technology  as  well  as  the  planned  spectrum 
of  workpieces. 


Fig.  1.  The  process  chain  of  hydroforming  of  sheet  metal  pairs. 


In  order  to  evaluate  the  economic  consequences  of  the  implementation  of  the  hydroforming  technique,  a 
complex  network  of  interdependencies  has  been  taken  into  consideration  (Fig.  2):  It  contains  the  attributes 
of  these  workpieces  which  are  planned  to  be  produced.  They  are  represented  in  the  example  by  their 
quantity  and  variants.  Because  of  their  dependency  on  changing  demand,  they  are  linked  to  the  investigated 
planning  time.  The  attributes  of  the  process  steps,  their  involved  machines,  and  the  tools  are  depicted  in 
terms  of  depreciation  costs  and  capacity  utilization.  As  indicated  by  the  arrows,  the  utilization  must  take 
into  account  its  dependency  on  manufacturing  time  and  the  produced  quantity.  Moreover,  the  utilization 
affects  the  depreciation  costs.  In  turn,  the  manufacturing  cost  depends  on  depreciation  costs  and 
manufacturing  time.  The  goals  of  process  chain  implementation  are  in  terms  of  flexibility  and  efficiency 
(manufacturing  time  and  costs).  Furthermore,  the  frequency  as  well  as  the  costs  associated  with  change  over 
time  affects  the  flexibility  in  sequencing  of  workpiece  variants,  keeping  in  mind  that  each  workpiece  variant 
requires  a  shape  dependent  forming  tool.  Finally,  the  flexibility  and  the  efficiency  influence  risks  and 
chances  involved  in  process  chain  implementation.  To  highlight  the  diverse  interdependencies,  the  arrows  in 
Figure  2  are  accompanied  by  plus  or  minus  signs  indicating  positive  and  negative  effects,  respectively. 


-►  Batch  Size 


Workpiece 

Quantity 


P|anning  ^duced^  utilisation 
Time  Quantity  — 


•  Workpiece 
Variants 


Change-  _ 

Over  Effort  + 


Deprecia¬ 
tion  Costs 


Manufactur¬ 
ing  Time 


Flexibility 
in  Product 
Sequence 


Chances/ 

Risks 


i 

Goals  of 
Effciency 


Fig.  2.  Effects  between  the  goals  and  the  attributes  of  product  and  process  chain. 

The  simplified  representation  in  Figure  2  indicates  the  complexity  of  relationships  between  the  attributes  of 
manufacturing  process  and  the  intended  goals  of  the  innovative  process  chain.  Furthermore,  the  quantity 
and  variants  of  products  depend  on  the  dynamic  behavior  of  market  demand.  Therefore,  the  dependencies 
have  been  evaluated  with  regard  to  the  planning  period.  It  results  in  a  dynamic  scenario  which  has  been 
modeled  and  simulated  to  decide  whether  or  not  the  innovative  process  chain  will  satisfy  the  economic 
goals.  The  large  number  of  interdependencies  requires  a  comprehensive,  dynamic  simulation  concept  which 
is  capable  of  modeling  all  important  factors  and  their  mutual  interactions. 


643 


COMPONENTS  OF  THE  MANUFACTURING  PROCESS 

In  order  to  cope  with  complexity,  a  model  structure  of  the  investigated  technology  defining  various 
simulation  objects  has  been  created.  These  objects  define  an  abstract  layer  of  the  system.  They  are  designed 
to  represent  separately:  the  process  steps;  the  machines;  the  tools;  the  workpieces;  and  the  human  being  or 
operator.  For  each  component,  a  set  of  significant  attributes  is  assigned  and  quantified  in  order  to  define 
their  behavior.  For  example,  a  tool  or  a  machine  is  described  by  its  expected  utilization,  investment  and 
depreciation  cost,  effective  life,  etc.  The  process  step  is  characterized  by  the  flow  of  workpieces,  the 
associated  processing  time,  the  workcenter  costs  or  the  effort  for  flexible  changing  in  product  variants  etc. 
The  workpiece  variations  are  represented  essentially  by  their  quantity  and  variants.  Furthermore,  short  term 
planning  strategies,  such  as  choice  of  batch  size,  are  linked  to  this  aspect.  The  human  being,  i.e.  the 
operator,  can  increase  efficiency  based  on  experience  gained.  This  effect  is  described  as  the  learning  curve. 

In  order  to  design  a  dynamic  simulation  scenario,  a  set  of  initial  attribute  values  has  to  be  quantified  for 
each  component.  In  contrast  to  the  well  researched  simulation  of  existing  manufacturing  systems,  the 
simulation  of  innovative,  eventually  non-implemented  process  chains,  needs  a  completely  different 
initialization  of  the  scenario.  The  definition  has  to  take  into  account  that  the  lack  of  concrete  knowledge 
obstructs  the  exact  definition  of  these  attributes.  Furthermore,  a  couple  of  attributes  is  characterized  by  a 
vaguely  known  dynamic  behavior  (e.g.  quantity  of  workpieces,  utilization  of  machines). 


For  example,  instead  of  a  well  known  deep  drawing  tool,  a  set  of  suitable  forming  tools  can  be  incorporated 
in  the  scenario.  To  define  their  initial  attributes,  the  lower  and  upper  range  as  well  as  the  most  likely  value 
of  cost,  e.g.  average  investment  costs,  have  been  identified.  This  approach  results  in  a  fuzzy  set  defined  by 
the  minimal,  maximal  and  the  mean  values  (Fig.  3).  The  mathematical  description  of  these  fuzzy  sets  is 
formally  given  by  a  LR  fuzzy  number  [5].  The  steady  membership  function  \l(x)  of  a  fuzzy  number 
determines  an  uncrisp  (non-real)  numeric  value  by  the  real  triple  (m,a,\\)  where  m,  a,  P  are  the  mean  value, 
the  left  and  the  right  support  (Fig.  3,  [5]).  The  shape  of  the  fuzzy  set  can  be  influenced  by  choosing  a 
function  L(u),  R(u)  for  the  left  and  the  right  hand  side,  respectively,  such  as 

if— — —  j ;  for  x<m 

^W  =  '  ix-ml  I- 

R  — —  ;  for  x  >  m 

[  l  P 

and  L(u)  =  R(u)  =  max{  0,1  -  u) 

which  leads  to  a  partial  linear  membership  function  p  [5,  6],  A  membership  degree  \i(x)  =  1  means  a  high 
degree  of  possibility.  In  contrast,  a  membership  degree  \i(x)  =  0  indicates  that  the  related  values  are 
impossible.  The  fuzzy  numbers  are  often  associated  with  a  linguistic  term,  such  as  approximately  30 
thousand  Euro  for  the  investment  costs  of  the  tool. 


644 


A  further  application  of  fuzzy  set  theory  occurs  with  the  specification  of  the  vaguely  known  dynamic 
behavior  of  initial  features.  A  good  example  would  be  the  expected  quantity  of  a  workpiece  variant 
(workpiece  component).  It  can  be  estimated  for  different  planning  periods  (e.g.  quarter  of  a  year, premise  of 
the  rule  system).  The  planning  periods  are  represented  by  trapezoidal  fuzzy  numbers.  According  to  the 
premise,  an  uncrisp  number  of  expected  demand  must  be  defined  for  each  planning  period  (consequence). 
The  procedure  described  results  in  a  fuzzy  rule  system  as  shown  in  Figure  4.  Fuzzy  rules  and  their 
evaluation  are  applied  to  such  simulation  quantities  which  cannot  be  described  using  arithmetic  functions. 

The  Modular  Concept  of  the  Simulation  System 

Simulation  systems  are  designed  to  predict  future  developments  which  are  described  by  dependencies  of 
various  interacting  system  variables.  In  order  to  assess  the  economic  efficiency  of  innovative  technologies, 
the  complex  dependencies  between  the  attributes  of  manufacturing  components  and  the  efficiency  goals 
must  be  evaluated.  A  modular  concept  is  designed  for  representing  these  dependencies  in  a  formal  manner. 

Figure  5  shows  the  objects  of  the  scheme  which  are  designed  to  represent  the  components  of  manufacturing 
process  keeping  in  mind  machines,  the  tools,  the  experience  of  human  being  (operator),  the  workpieces,  and 
the  process  steps.  The  technology  component  aims  at  aggregating  the  partial  effects  of  efficiency  as 
evaluated  by  the  process  steps.  The  dependencies  between  these  high  level  components  are  depicted  by 
arrows.  The  components  and  their  interfaces  define  various  interacting  simulation  objects,  thus  the 
simulation  scheme  can  be  treated  as  an  object  oriented  simulation  [7]. 


Fig.  5.  The  objects  of  the  simulation  system. 

The  separate  representation  of  planning  periods  by  planning  time  and  their  connection  towards  the 
workpiece  components  must  be  done  for  the  evaluation  of  dynamic  quantities  (fuzzy  rule  systems),  i.e.  the 
quantity  of  workpieces.  The  objects  of  the  1st  and  2nd  process  steps  include  the  flow  of  workpieces,  and  the 
features  of  manufacturing  time  and  costs  regarding  the  connected  workpieces,  tools  and  machines.  The 
components  of  tools  and  machines  are  linked  to  those  process  steps  which  require  their  capacity  and  cost 
features.  The  object  of  technology  contains  quantities  of  the  flexibility,  the  costs  and  the  efficiency  of 
production.  These  quantities  are  derived  from  cost  and  time  attributes  as  simulated  by  the  connected 
objects. 

DETAILED  DESIGN  OF  THE  SIMULATION  OBJECTS 

Each  of  these  high  level  components  consists  of  a  set  of  simulation  variables.  Their  type  of  design, 
connection  and  behavior  is  similar  to  Forrestermodels,  a  framework  for  the  simulation  of  complex  dynamic 
behavior  [8],  The  system  variables  are  divided  into  stocks,  converters  and  flows  (Fig.  6).  The  converters 
have  to  be  calculated  for  each  time  step,  regardless  of  their  last  state.  They  have  to  perform  an  arithmetic 
operation  or  the  evaluation  of  a  fuzzy  rule  system.  In  contrast,  stocks  act  as  the  memory  of  the  simulation 
system.  Their  state  at  any  time  depends  on  their  state  at  the  previous  time  step  and  on  directional  flows. 
Flows  into  a  stock  are  accumulated;  the  outgoing  flow  from  a  stock  causes  a  decrease  in  the  numerical  stock 
value.  The  value  of  any  flow  must  be  determined  using  flow  regulators,  which  are  described  by  time- 
dependent  differential  equations.  Stocks  are  always  zero  or  greater.  Sources,  depicted  in  Figure  6  as  clouds, 
act  as  stocks  with  unlimited  capacity. 


645 


The  excerpt  of  the  workpiece  object  is  as  an  example  of  detailed  design  (Fig.  6):  The  simulated  variables 
represent  the  attributes  of  the  workpiece,  designed  to  model  their  behavior  within  the  process  chain.  The 
conversion  of  demanded  workpieces  per  time  has  a  direct  connection  with  scenario  time.  This  is  designed 
to  incorporate  expected  demand  modeled  by  a  time-dependent  fuzzy  rule  shown  in  Figure  4.  In  turn,  the 
choice  of  batch  size  may  be  influenced  by  the  demand.  Therefore,  a  second  fuzzy  rule  system  is  designed  to 
formalize  batch-size  strategies  as  experienced  during  past  production.  The  required  workpieces  per  time  is 
accumulated  in  the  stock  on  the  left  hand  side  {demanded  workpieces',  x).  Its  growth  is  described  by  the 
equation  dx/dt=w  (or  Ax/At=w)  where  w  represents  the  current  demand.  To  simulate  the  quantity  of 
produced  workpieces,  demand  is  compared  with  capacity.  So,  an  input  variable  (interface  to  the  objects 
representing  the  involved  process  steps)  aggregates  the  manufacturing  time,  taking  into  account  available 
capacity.  Indeed,  the  throughput  quantity  of  this  workpiece  variant  is  calculated  by  the  lesser  of  capacity 
and  demand.  This  is  accumulated  by  the  second  stock  (produced  workpieces).  The  variable  workcenter 
costs  is  connected  to  the  involved  process  steps,  from  which  the  cost  per  workpiece  is  derived.  The  output 
variables  connect  the  machines  and  process  steps  to  utilize  their  capacity,  to  relay  the  frequency  of  product 
changing,  or  to  incorporate  product  specific  attributes  into  a  final  assessment  of  the  process  chain. 


Input  Variables  and 
their  Objects: 

Planning  Time: 
Current  Scenario  Time 


Involved  Process  Steps: 
Manufacturing  Time 

Involved  Process  Steps: 
Workcenter  Costs 


Fuzzy  Rule  '  Ba|ch 

.X.  Size 

Fuzzy  Rule  Dcmand 


per  Time- 

Fuzzy 
Minimum 


f  Fuzzy 

k  Addition  ^'Workcen- 
*  ter  Costs 


Output  Variables  and 
their  Objects: 

Involved  Process  Steps: 
Batch  Size 
Involved  Machines: 
Utilized  Capacity 


Involved  Machines: 
Capacity  Consumption 
(Utilization) 


Technology: 
Manufacturing  Time 

Technology: 

Costs  per  Workpiece 


Caption  of  Simulation  Variables 


Source 

C  j 

Converter 

I  ) 

Stock 

.h*? 

Flow  with  Flow 

n  ! 

Regulator  ('Valve") 

.  jj~i 

Input/Output-Variable 

► 

Fig.  6.  Excerpt  of  detailed  design  of  the  simulation  object  Workpiece  variant. 

The  basic  simulation  concept  is  applied  to  all  other  components  in  the  simulation  system.  Each  simulation 
object  must  represent  its  own  state  and  interact  with  one  another  by  predefined  interfaces.  Moreover,  the 
object-oriented  concept  allows  the  design  a  set  of  reference  objects  for  the  components  of  manufacturing 
process.  They  might  have  been  incorporated  in  the  scenario  in  multiple  instances  depending  on  workpiece 
variety  and  the  complexity  of  the  process  chain.  Therefore,  process  chains  of  variable  process  steps, 
machines,  tools  and  workpieces  could  be  simulated. 

EVALUATION  OF  A  SIMULATION  SCENARIO 

The  simulation  variables  included  in  objects  (introduced  above)  are  quantified  by  numerical  values  for  each 
time  step  of  the  dynamic  scenario.  Because  of  the  scenario  initialization  by  fiizzy  numbers  and  frizzy  rules, 
the  evaluation  of  a  simulation  object  must  be  done  by  fuzzy  number-based  operations.  Fuzzy  rules  are 
evaluated  by  determining  the  membership  degree  for  each  fuzzy  set  of  the  premises.  The  membership 
degree  is  assigned  to  the  consequences  (Fig.  3.  [5]).  Finally,  the  fuzzy  sets  of  the  consequence  which  have  a 
non-zero  membership  degree  are  aggregated.  This  evaluation  method  does  not  apply  defuzzification  to 
avoid  the  loss  of  the  fuzzy  sets’ support. 

For  further  numerical  operations  (e.g.  multiplication,  addition,  etc.)  the  extension  principle,  which  extends 
conventional  numerical  operations  into  fuzzy  quantities,  is  applied.  It  is  defined  for  two  arguments  in  terms 
of  membership  functions  p,d(x)  and  |ib(x)  [5,  6]: 

IW(z)  =  suPz=xxv  min{[la(x),\ib(y)} 


2. 


646 


In  the  case  of  LR  fuzzy  numbers,  there  are  a  set  of  predefined  operators,  such  as  [5,  6]: 
(w,a,p)/r©(«,y,8)/r  =  a  +  y,p  +  8)/r 
(w,a,P)/r  <0,y,8)/r  =  (mn,my  +  na,m5  +  n$)lr 

for  addition  and  multiplication,  respectively.  In  order  to  derive  only  triangular  fuzzy  numbers  with  a  linear 
membership  function,  the  result  obtained  by  the  extension  principle  can  be  slightly  modified  [5], 

However,  application  of  fuzzy  operators  can  result  in  unexpected  values.  These  may  occur  if  an  improper 
function  appears  in  algebraic  terms  [6].  These  terms  contain  a  variable  or  a  function  of  this  variable  more 
than  once.  The  fuzzy  operators  try  to  extend  the  support  as  much  as  possible  without  accounting  for  any 
dependencies  between  their  arguments.  The  following  approaches  are  taken  into  account  by  model  design: 

•  Terms  containing  a  variable  more  than  once  should  be  avoided, 

•  the  use  of  isotronic  functions  [6], 

•  the  numeric  evaluation  by  the  extension  principle  incorporating  additional  conditions  [6], 

•  the  value  of  each  stock  must  be  independent  of  improper  functions. 

These  approaches  are  often  considered  by  modifying  the  model  structure  or  the  evaluation  method,  which 
are  basically  fixed  by  predefined  components  and  the  interface.  These  measures  attempt  to  avoid  improper 
functions  and/or  minimize  error  propagation  during  simulation.  Although  defuzzification  generally  avoids 
improper  functions,  it  is  not  applied  here  because  of  the  loss  of  a  fuzzy  setk  support  and  its  information. 


A  SCENARIO  FOR  THE  HYDROFORMING  OF  SHEET  METAL  PAIRS 

The  concept  is  applied  to  the  process  chain  of  hydroforming.  The  scenario  evaluates  the  required 
components  and  derives  the  risk  and  chances  of  the  process  chain  implementation.  Figure  7  shows  the 
simulation  object  technology  (see  also  Figure  5).  The  portfolio  technique  aims  to  compare  risks  and  chances 
[9].  Because  the  scenario  is  initialized  by  fuzzy  sets,  the  simulated  results  are  also  predicted  by  fuzzy  sets. 


Simulation  Scenario  (Program  Excerpt): 


mam 

■"  '  ill 

1 


-Capacity.  V- 


a 

P 

1 

mi 

The  Causes  of  Undervalued  Efficiency: 

Possible  Causes  (Quantities  with  Insufficient 
Values)  as  Simulated  in  the  Technology  Object 


■  ♦ 

Flexibility 


Technological 


Coats 


Flexibility  in  Product  Sequencing 


Optimization  and 
Evaluation  of 
Further  Scenarios  i 


Causes  in  other  objects  (workpiece,  machines): 
-Variants  of  workpieces, 

-Change-over  time  and  costs, 

•  Batch  si2ing  strategies, 

«  Quantity  of  workpieces 


Fig.  7.  Predicted  risks  and  chances  for  hydroforming  technology  by  aggregating  the  flexibility, 
the  costs  and  the  efficiency  of  the  process  chain. 


An  excerpt  of  the  computer  program,  shown  on  the  left  hand  side  of  Figure  7,  is  completed  by  the  keywords 
indicating  the  sequence  of  aggregating  efficiency  goals.  It  is  recognized  that  the  chances  are  assessed  from 
the  predicted  flexibility  and  the  attained  efficiency  in  terms  of  cost  and  time.  Moreover,  it  can  be  seen  that 
flexibility  includes  capacity  (capability  to  adapt  to  changing  product  quantity)  and  technology  (capability  to 
adapt  to  changing  variants).  Simulation  variables  are  valued  with  respect  to  behavior  of  other  simulation 


647 


objects,  such  as  process  steps,  machines,  etc.  For  example,  flexibility  in  sequencing  product  variants 
depends  on  frequency  and  expense  of  time  and  cost  to  change  the  product  variant  keeping  in  mind  that  each 
variant  requires  its  own  shape-dependent  forming  tool.  In  turn,  these  quantities  are  determined  by  the 
depreciation  cost  of  machinery,  change-over  time,  and  the  quantity  and  variety  of  products  and  their  batch 
size.  Risk  is  assessed  by  the  insufficient  utilization  of  expensive  process  techniques  and  investment  costs. 

Comparison  of  risks  and  chances  by  the  portfolio  technique  [9]  can  yield  potential  for  a  process  chain.  The 
portfolio  identifies  behaviors  by  the  relations  of  chance  and  risk.  Moreover,  wide  support  for  the  predicted 
fuzzy  set  indicates  several  highly  sensitive  initial  attributes.  To  optimize  the  implementation  of  a  process 
chain,  the  causes  of  a  simulation  result  are  identified.  Concerning  the  hydroforming  process  chain  in  Fig.  7., 
causes  of  undervalued  chances  can  be  analyzed  interactively  as  follows  (Fig.  7.): 

Costs,  manufacturing  time  and  flexibility  have  been  investigated.  It  is  shown  that  technological  flexibility  is, 
especially  undervalued.  A  more  detailed  investigation  shows  that  flexibility  in  product-sequencing  is  the 
most  undervalued  factor  involved  in  technological  flexibility,  indicating  high  effort  to  adapt  to  product 
variations.  The  investigation  focuses  on  the  simulation  of  objects  of  the  process  chain  and  process  steps. 
Small  batch  sizes,  small  quantities  of  workpieces,  high  frequency  and  effort  in  change-over,  combined  with 
high  investment  costs,  are  the  main  factors  responsible  for  an  undervalued  assessment.  With  this  knowledge, 
a  first  approach  to  optimisation  is  possible:  In  fact,  the  quantity  of  workpieces  has  increased.  Alternatively, 
an  extended  batch  size  may  reduce  requirements  of  flexibility.  As  well,  more  flexible  tool  techniques  using 
modular  tools  or  shape-dependent  inserts  can  reduce  the  effort  to  adapt  to  product  variations.  These 
approaches  can  be  confirmed  and  investigated  by  further  scenarios  incorporating  intended  changes  in  the 
product  spectrum  or  process  technique  by  modified  attributes  of  manufacturing  components. 

SUMMARY 

The  design  of  hybrid  simulation  objects  using  fuzzy  set  theory  to  assess  implementation  of  innovative 
process  chains  in  sheet  metal  processing  has  been  demonstrated.  To  cope  with  complexity,  a  modular 
concept  is  used  to  represent  components  of  a  manufacturing  process,  their  behavior  and  interdependencies. 
In  particular,  integration  of  uncertain,  dynamic  behavior  of  innovative  process  chains  has  been  considered. 
Uncertainties  in  estimating  process  attributes  or  vaguely-known  dynamic  behaviors  are  accounted  for  by 
qualifying  the  simulation  variables  as  fuzzy  sets  or  frizzy  rules.  This  allows  their  specification  using 
membership  functions  which  enclose  a  variation  range  dependent  on  the  precision  of  the  knowledge.  These 
uncrisp  values  are  integrated  into  the  simulation  by  extending  dynamic  models  with  algebraic  fuzzy 
operators  and  fuzzy  rule  systems.  The  system  predicts  the  mean  efficiency  and  its  sensitivity.  Moreover,  it 
supports  an  analysis  of  the  causes  of  predicted  efficiency  to  provide  starting  points  for  optimisation. 

ACKNOWLEDGMENT 

The  authors  are  grateful  for  support  from  the  German  Research  Society  (Deutsche  Forschungsgesellschaft, 
DFG),  under  the  research  project  "Assessment  of  the  implementation  of  innovative  technologies  into 
industry  using  simulation"  and  the  activities  in  hydroforming  within  the  special  research  topic  SFB  396. 

REFERENCES 

1.  J.  M.  Utterback,  1994.  Mastering  the  dynamics  of  innovation.  Mass.  Harvard  Business,  Boston. 

2.  J.E.  Lenz,  1986.  Proc.  2nd  International  Conference  on  Simulation  in  Manufacturing.  IFS,  Chicago. 

3.  C.  Ullmann,  1995.  Methodik  zur  Verfahrensplanung  von  innovativen  Fertigungstechnologien  im 

Rahmen  der  technischen  Investitionsplanung.  Shaker,  Aachen. 

4.  P.  Hein,  F.  Vollertsen,  1998.  Hydroforming  of  sheet  metal  pairs.  J.  Mater.  Proc.  Tech.,  87  1-3,  154-164. 

5.  H.-J.  Zimmermann,  1991 .  Fuzzy  Set  Theory  and  Its  Applications.  Kluwer  Academic  Publishers,  Boston. 

6.  D.  Dubois,  H.  Prade,  1987.  Fuzzy  Numbers:  An  Overview.  J.  C.  Bezdek:  Analyses  of  Fuzzy 

Information,  Vol.  1,  CRC  Press,  Boca  Ranton,  Florida. 

7.  B.  Schmidt,  1996.  Die  objektorientierte  Modellspezifikation.  B.  Schmidt:  Sim.  in  Passau,  1996,  1,4-17. 

8.  J.W.  Forrester,  1972.  Industrial  dynamics.  Mass.  MIT  Press,  Cambridge. 

9.  H.  Wildemann,  1987.  Strategische  Investitionsplanung.  Gabler,  Wiesbaden. 


648 


649 


Manufacturing  Management  Improvement  through 
Rapid  Production  of  Budgets 

E.J.  Colville 

School  of  Engineering,  University  of  Tasmania, 

PO  Box  252/65,  Hobart  7001,  Tasmania,  Australia 
Email:  respub@access.net.au 


ABSTRACT 

This  paper  explains  a  method  whereby  manufacturing  control  of  a  firm  can  be  facilitated  by  rapid 
preparation  of  coordinating  budgets  using  simulation.  Results  of  this  process  applied  to  a  small  organisation 
are  reported.  By  being  mathematically  integrated,  this  budgeting  system  embraces  the  whole  of  a  firmk 
activities  and  allows  investigation  of  alternative  policies  to  obtain  an  optimum  forward  plan.  The  program 
manual  and  an  outline  of  the  budget  support  software  is  explained  and  its  use  with  a  PC  is  demonstrated. 

Use  of  this  mathematical  simulation  process  to  produce  an  overall  budget  not  only  provides  detailed 
operating  budgets  but  gives  basic  data  such  as  cost  standards  and  departmental  hourly  rates  for  use  in 
ancillary  marketing,  costing  and  production  planning  programs  and  input  to  progressive  performance 
reports.  Sales,  production,  factory  overhead  and  administration  budgets  for  each  department  are  produced  in 
a  single  run  for  each  proposal.  At  the  same  time  profit  and  loss,  cash  flow  predictions,  balance  sheets  and 
funds  movements  are  all  provided  using  the  budget  program  in  addition  to  cost  standards  and  hourly  rates. 

A  survey  of  budgeting  practice  in  Australia  over  a  range  of  manufacturers  points  to  an  increasing  accent  on 
deriving  overall  plans  as  well  as  input  to  production  planning  and  calculation  of  cost  standards  direct  from 
the  budget.  Many  firms  no  longer  rely  solely  on  production  department  estimates  and  the  current  order 
position  for  their  detailed  planning  input.  A  feature  of  this  procedure  is  its  contribution  to  communications 
between  board  and  shop  floor  and  between  the  marketing,  design,  production  and  finance  departments  with 
the  result  that  middle  management  costs  are  minimised  and  implementation  is  expedited.  General 
management,  accounting  and  engineering  tend  to  be  consolidated  using  this  approach. 

Comments  on  the  application  of  this  budget  support  system  over  the  past  decade  are  reported  and 
recommendations  are  given  for  its  use  as  a  coordinating  and  planning  procedure,  particularly  for  small  to 
medium  size  companies. 

INTRODUCTION 

From  the  early  days  of  Gantt  charts  for  direct  machine  and  labour  loading  to  the  current  production 
planning  schemes  such  as  MRP  11,  the  accent  of  management  has  typically  been  on  production  standards 
based  on  shop-floor  experiences  and  attitudes.  In  many  text  books,  manufacturing  management  and 
manufacturing  as  a  whole  has  too  often  been  equated  only  to  factory  production  and  mechanical  skills. 

A  new  era  now  opens  up  in  which  at  last  it  is  being  realised  that  overall  management  of  a  manufacturing 
firm  requires  managing  skills  that  embrace  the  shop  floor,  technology  skills  and  business  skills  rather  than 
relying  solely  on  just  one  of  these.  There  is  now  a  global  recognition  that  major  improvement  can  follow  an 
analysis  of  the  whole  of  an  enterprise's  activities  within  the  socio-economic  background  of  its  industry,  both 
locally  and  internationally.  Far  too  often  we  have  debated  the  attributes  of  MRPII  and  ERP,  for  example,[l] 
without  placing  sufficient  emphasis  on  the  overall  input  to  these  programs  by  the  policy  makers.  We  should 
not  rely  on  production  people  to  fill  the  gap  which  top  management  should  define. 

The  revolution  in  recent  years  in  computer  hardware,  software  and  staff  availability  with  skills  in  the 
programming  areas  needed,  now  allows  immediate  calculation  of  both  forward  budgets  and  input  to  detailed 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


650 


production  planning  programs  for  the  execution  of  current  orders.  At  the  same  time  the  policies  agreed  by 
company  executives  can  be  incorporated  in  time  for  review  and  allow  communication  to  all  who  have  to 
participate  in,  and  implement,  the  budget. 

It  is  more  than  30  years  since  the  budget  research  reported  in  this  paper  was  initiated.  Since  that  time  software 
improvements  and  methods  of  presentation  have  improved  and  at  the  same  time  the  cost  of  the  facilities  and  speed  of 
system  operation  have  dramatically  changed.  Original  trials  ranging  from  a  drilling  company  to  a  large  cattle  and 
sheep  farm  have  led  to  regular  use  in  a  printing  and  publishing  company  as  a  laboratory  to  develop  the  software,  in 
particular  for  presentation  and  ease  of  data  entry.  The  Budget  Support  program  [2]  simulates  the  financial  flows  of  the 
budget  and  sets  out  key  plans  for  marketing,  production  and  financial  control.  It  also  provides  the  standards  forlabour 
and  machine  loads  as  well  as  material  purchase  requirements  to  meet  current  orders  and  provide  a  basis  for  production 
planning. 

This  procedure  has  verified  the  program^  validity.  It  assists  information  flow  to  meet  the  increasing  need 
for  greater  communication  within  firms  which  have  been  used  to  hierarchical  management.  At  the  same  time 
economic  pressure  to  reduce  middle  management  while  still  improving  information  services  requires 
improved  links  between  top  management  and  factory  operations.  Clearly  the  accent  in  manufacturing 
management  will  move  towards  combining  the  "general"  of  overall  management,  and  the  "particular"  of 
shop  floor  activities  to  process  orders  involving  plans  of  the  MRP,  ERP  and  JIT  type  as  well  as  simpler 
production  planning  procedures.  The  consequence  of  this  would  appear  to  be  that  hierarchical  management 
will  be  modified  in  many  democratic  countries  to  provide  a  much  greater  degree  of  participation  and 
improved  team  work.  This  need  for  two  departments  to  work  together  has  been  highlighted  in  a  paper  by 
H.Morino  [3],  of  the  Overseas  Product  Development  Division  for  Toyota,  in  which  he  sees  the  need  to 
overcome  the  barrier  of  departmentalisation  if  we  are  to  attain  top  quality. 


PROGRAM  DESCRIPTION 

The  modelling  process  and  its  application  described  in  references  [4],  [5],  [6]  and  [7]  simulates  a 
manufacturing  concern  from  the  stage  of  budget  coordination,  and  trials  of  alternate  policies,  to  adoption  of 
the  budget.  The  first  requirement  of  integrated  manufacturing  so  clearly  outlined  byGerelle  and  Stark  [8]  is 
the  definition  of  the  strategy  to  be  adopted.  ‘This  cannot  be  achieved  unless  it  is  known  where  the 
company^  management  wants  to  go,  in  other  words,  until  the  objectives  of  the  company  have  been 
defined”.  These  involve  the  outward  looking  marketing  objectives  as  concerned  with  customers  and  the 
inward  looking  innovation  objectives  concerned  with  producing  products  to  satisfy  customers.  The  budgets 
then  follow  for  each  of  the  sales,  production  and  finance  departments.  Control  of  liquidity,  profit  and 
balance  sheet  criteria  is  made  possible  through  regular  comparison  reports  highlighting  differences  between 
budget  and  actual  results  achieved  and  analysis  of  financial  ratios. 

To  achieve  this,  the  dollar  value  as  the  common  transfer  item,  is  used  in  equations  which  define  the 
interacting  relationships  between  the  departments  of  the  firm.  As  an  example,  sales  quantities  for  the  budget 
period  for  each  subassembly  are  transferred  to  sales  dollars  using  multipliers  involving  price,  distribution 
throughout  the  budget  period,  and  comparison  with  prior  sales  quantities. 

Similarly  materials  required  to  make  each  subassembly  are  calculated  using  the  proportion  of  each  material 
in  each  subassembly,  the  purchasing  distribution  and  the  unit  material  cost.  Wages  are  calculated  from 
expected  hours  for  the  staff  allocated  to  each  wage  department  at  an  expected  wage  rate.  Estimates  of 
supervision  for  the  direct  wages  is  provided  using  a  matrix  which  proportions  the  total  of  direct  wages. 

Factory  overheads  are  set  down  in  their  categories  as  are  administration  accounts  divisions.  Some  variable 
sales  expenses  are  allocated  as  a  proportion  of  total  sales  dollars.  The  actual  opening  balance  sheet  is 
supplied  as  data  so  that  the  simulation  can  build  on  it  as  the  budget  period  progresses.  Financial  information 
which  affect  the  costs,  the  cash  flow  and  the  progressive  balance  sheets  is  also  entered  as  data  together  with 
trading  terms  which  affect  cash  flow.  Changes  to  capital,  depreciation,  loans  and  equity  are  also  entered. 


651 


Finally  matrices  are  introduced  that  distribute  overheads  where  they  belong,  and  allow  calculation  of  hourly 
rates  by  departments.  The  names  of  the  budget  components  and  the  periods  which  a  firm  wishes  to  give  to 
its  budget  and  its  products/subassemblies  as  well  as  its  accounts  and  operating  departments  can  also  be 
entered  as  data  to  allow  more  general  application  of  the  simulation  to  a  range  of  industries. 

The  pull-down  menu  of  Figure  1,  from  the  manual  [2]  shows  the  components  of  the  budget  support  system 
made  available  on  the  computer  screen. 


IMPLEMENTATION 

First  it  is  necessary  to  recognise  that  the  initial  data  entry  is  a  key  task  requiring: 

•  detailed  investigation  of  the  firm's  activities 

•  clarification  of  the  terms  describing  the  products  subassemblies  and  accounts  categories  of  the  firm 

•  meetings  of  departmental  heads  to  exchange  their  needs  and  the  views  to  be  fed  into  the  budget  for  trials  in  which 
the  results  of  their  plans  will  be  calculated,  and  if  necessaiy  modified,  before  adoption  as  an  agreed  budget  they 
are  prepared  to  positively  implement. 

•  entering  of  around  1000  data  items  describing  the  budget. 

•  alignment  of  company  book  categories  with  those  of  the  budget  or  vice  versa. 

Next  is  maintenance  and  data  review  in  light  of  new  plans  and  any  change  in  micro  and  macro-economic 
conditions.  While  the  five  initial  installation  items  above  still  apply,  they  usually  take  far  less  time  and  less 
executive  stress  as  the  years  of  budgeting  advance,  since  in  a  particular  firm  certain  variables  are  found  to 
predominate  and  many  of  the  relationships  show  only  minor  variation.  It  is  however  necessary  to  catch  up 
with  the  opening  balance  sheet,  as  well  as,  new  products  and  changes  in  emphasis  among  the  product  range 
and  potential  capital  investment  and  repayment  plans.  Prices,  wage  rates  and  unit  material  costs  may  need 
review.  The  totals  of  immediate  past  actual  costs  should  be  compared  with  those  of  any  new  budget. 

The  most  important  part  of  a  management  procedure  of  this  type  is  to  ensure  the  desired  results  are 
achieved.  By  the  participation  of  those  best  able  to  implement  the  plan  action  within  the  objectives  of  the 
company  are  mostly  likely  to  take  place.  If  people  own  the  plan  and  believe  in  it,  they  are  more  likely  to 
push  it  through  despite  obstacles  on  the  way. 


CASE  STUDIES 

Development  of  the  simulation  has  taken  place  as  part  of  the  management  of  a  printing,  publishing  and 
consulting  company.  At  the  same  time,  there  has  been  the  stimulus  to  provide  management  education,  the 
encouragement  of  the  University  of  Melbourne,  and  the  need  to  grasp  the  opportunities  which  the  computer 
hardware  and  software  revolution  gives  in  providing  real-time  information  for  decision-makers. 

Introduction  of  the  program  has  highlighted  the  following: 

Advantages 

1 .  The  saving  in  infrastructure  costs,  such  as  accounting,  estimating  and  liaison  costs  between  production 
and  supply  people  and  similarly  between  production  and  marketing  people. 

2.  Improved  participation  and  the  assistance  to  staff  who  desire  to  take  respnsibility  and  who  desire  to 
take  a  direct  interest  in  a  firmfc  future. 

3.  Ability  to  meet  the  needs  of  macro-economic  change  by  testing  the  effect  of  radical  changes  prior  to 
implementation.  An  example  is  movement  from  vertical  to  horizontal  integration  should  a  change  from  a 
seller^  to  a  buyers  market  justify  this. 

4.  Wise  choice  of  timing  for  both  investment  and  borrowing  to  assist  the  preservation  of  liquidity  and 
hence  the  long  term  security  of  the  company. 


652 


5.  Easy  maintenance  and  a  more  profitable  business. 

6.  Incorporates  such  skills  as  accounting  and  industrial  psychology  which  broaden  the  role  of  engineers 
with  IT  aptitudes,  and  facilitates  their  management  of  technology  based  enterprises. 

7.  Applicable  to  a  wide  range  of  industries  by  simply  changing  names  and  data  since  accounting  principles  are  similar. 

Disadvantages 

1 .  Resistance  to  acceptance  can  occur  due  to  the  fact  that  this  process  is  one  oflong  term  benefit  and  needs 
consultation  whereas  many  general  managers  concentrate  on  short  term  returns.  This  can  cause  the  time 
taken  for  initial  installation  and  acceptance  to  be  spread  over  several  months. 

2.  Firms  inculcated  with  a  hierarchical  approach  to  management  can  be  frightened  of  sharing  information 
and  improving  communication  which  is  fundamental  to  the  budget  support  process. 

3.  The  coordinator/facilitator  introducing  the  process  needs  to  have  an  insight  into  computer  programming 
as  well  as  management  needs. 

4.  The  internal  production  of  budgets  involving  full  coordination  may  be  resented  by  those  external 
professionals,  such  as  accountants,  previously  responsible  for  a  more  simplified  process  without  input 
from  the  key  staff  responsible  for  implementation. 

5.  The  problem  of  the  “Our  business  is  different”  attitude  in  many  paroch  ial  firms. 

DEVELOPMENT  PLANS 

While  there  are  always  items  for  technical  improvement,  the  following  will  be  receiving  attention  in  the 

immediate  future  in  view  of  the  effectiveness  of  the  methods  described. 

1 .  Attention  to  the  wider  commercial  application  of  the  simulation. 

2.  Reducing  the  time  of  installation  by  setting  out  proforma  procedures  for  obtaining  data,  for  meeting 
agendas  and  to  encourage  participation. 

3.  Defining  a  standard  method  of  adjusting  the  budgeted  balance  sheet  for  any  anomaly  arising  fom  the 
difference  between  the  carefully  calculated  stock  figures  of  the  simulation  and  the  sometimes  arbitrary 
estimate  in  the  company!;  books. 

4.  Enabling  the  maintenance  of  the  budget  to  be  expedited  by  defining,  for  a  particular  company,  the  more 
important  data  items  to  be  reviewed  regularly  rather  than  be  concerned  with  the  statistically  less  variable 
data  which  may  only  need  major  change  at  a  time  of  reconstruction  of  the  company]?  affairs. 


CONCLUSION 

It  has  been  shown  that  the  Budget  Support  program  described  meets  two  important  needs  of  current 
management.  First  the  provision  of  economic  budgets  quickly  as  business  climates  change.  Second,  if 
combined  with  a  high  degree  of  participation  by  senior  executives  and  their  staff,  efficient  implementation 
takes  place  as  a  result  of  team  work  throughout  the  plant.  The  procedures  outlined  shorten  the  lines  of 
communication  within  the  firm  so  that  flexibility  so  important  for  competitiveness  in  the  current  global 
economy  is  enhanced,  particularly  for  small  to  medium  size  companies  where  economic  infrastructure  is 
imperative. 

REFERENCES 

1 .  C.Hume,  D.Paynter,  1989.  An  investigative  analysis  of  Just  in  Time,  MRPI1,  and  their  integration. 
University  of  Melbourne  Special  Project,  45,1 13. 

2.  E.J. Colville,  1997.  Budget  support  -  A  Management  and  Development  Program.  Research  Publications, 
Melbourne. 

3.  H.Morino,  1999.  Total  Simultaneous  Management  through  Marketing  to  Development,  Engineering  and 
Production.  Proceedings  IPC  10  SAE  Melboume.Paper  99056. 


BUDGET  SUPPORT 


653 


4.  E.J.Colville,  1992.  Mathematical  Simulation  of  a  Manufacturing  Concern  -  An  Important  part  of 
Management  Education.  ACEME  1  Sydney. 

5.  E.J.Colville,  1986.  A  Management  Information  System  for  Small  Manufacturers.  SAE  International 
Conference  ,  Auckland  New  Zealand. 

6.  EJ.Colville,  1995.  Production  Budgets  and  Costing  Standards  using  Mathematical  Simulation  from 
Financial  Data.  13th  International  Conference  on  Production  Research,  Jerusalem  Israel 

7.  EJ.Colville,  1998.  Computerised  Master  Planning  for  a  Manufacturing  Firm  and  its  Application  to 
Factory  and  Sales  Departments.  International  Conference  on  Mechanical  Engineering  Tehran,  Iran. 

8.  E.G.R.  Gervelle,  J.Stack,  1988.  Integrated  Manufacturing  .  McGraw  Hill,  28. 


654 


655 


A  Connectionist  Method  to  Solve  Job  Shop  Problems 

Marko  Fabiunke,  Gerd  Kock 

GMD  FIRST  Berlin 
Rudower  Chaussee  5,  12489  Berlin 
Email:  marko@, first. gmd.de.  gerd@first.gmd.de 


ABSTRACT 

We  propose  a  novel  framework  to  solve  job  shop  scheduling  problems  based  on  connectionist  ideas  of 
distributed  information  processing.  In  our  approach  each  operation  of  a  given  job  shop  problem  is 
considered  to  be  a  simple  agent  looking  for  a  position  in  time  such  that  all  its  time  and  resource  constraints 
are  satisfied.  Each  agent  considers  the  current  time  position  of  its  constraint  neighbors  to  gradually  change 
its  own  position  to  reach  this  goal.  All  agents  together  form  a  recurrent  dynamical  system  which  either  self- 
organizes  after  some  iterations  to  a  feasible  schedule  or  fails  to  do  so  depending  on  the  constraints  of  the 
problem.  By  gradually  increasing  the  constraints  through  decreasing  the  allowable  overall  processing  time 
for  a  valid  schedule,  better  and  better  solutions  are  found  up  to  the  point  where  no  further  improvements 
can  be  made. 


INTRODUCTION 

In  classical  job  shop  scheduling  we  are  given  a  set  of  jobs,  each  of  which  consists  of  a  chain  of  operations, 
and  a  set  of  machines  being  able  to  process  at  most  one  operation  at  a  time.  Each  operation  needs  to  be 
processed  on  a  given  machine  and  during  an  uninterrupted  time  period  of  given  length.  All  jobs  have  to  be 
finished  at  a  given  latest  completion  time.  The  goal  is  to  find  an  assignment  of  operations  to  starting  points 
(schedule)  such  that  neither  the  machine  capacity  constraints  nor  the  job  precedence  constraints  are 
violated.  A  schedule  is  called  optimal  if  it  minimizes  the  overall  processing  time  (makespan)  of  all  jobs. 

Job  shop  problems  are  known  to  be  NP-hard  to  be  solved  to  optimality  and  have  the  reputation  of  being  one 
of  the  most  computationally  stubborn  combinatorial  problems.  Complete  search  algorithms  are  usually 
unable  to  solve  problems  with  more  than  a  few  operations.  For  practical  reasons  it  is  therefore  acceptable  to 
apply  heuristic  methods  to  find  approximations  to  the  optimal  solution  which  still  remains  difficult  to 
achieve.  Due  to  [1]  it  is  even  NP-hard  to  find  a  schedule  that  is  shorter  than  5/4  times  the  optimum. 
Therefore,  schedules  generated  by  enumerative  heuristics  based  on  simple  dispatching  rules  are  often  still 
far  from  the  optimum. 

Currently  we  distinguish  between  two  major  heuristic  approaches  to  solve  job  shop  problems.  Schedule 
construction  is  best  done  through  a  combination  of  classical  operation  research  methods  with  elaborated 
global  constraint  preprocessing  techniques  to  guide  the  process  of  construction  and  to  avoid  the  need  of 
backtracking  as  much  as  possible.  This  idea  has  found  its  best  expression  and  most  success  in  the  usage  of 
constraint  logic  programming  as  a  unique  framework  to  combine  both  methodologies  [2,  3],  Schedule 
repairing  uses  local  optimization  techniques  like  simulated  annealing,  tabu  search,  guided  local  search  and 
genetic  algorithms  to  empirically  create  solutions  starting  out  of  some  initial  found  schedule  [4,  5],  Those 
techniques  employ  the  common  idea  that  a  given  solution  may  be  improved  by  making  small  changes. 
Better  and  better  solutions  are  found  by  repeating  this  process  over  and  over  again.  Empirically,  local 
search  techniques  have  proven  to  be  very  effective  in  solving  large  scale  combinatorial  problems. 

The  scheduling  methodology  presented  here  clearly  belongs  to  the  local  search  group.  Starting  with  some 
initial  assignment  for  all  starting  points,  a  feasible  schedule  not  exceeding  a  given  maximal  makespan  will 
be  generated,  using  a  self-organizing  algorithm  based  on  connectionist  ideas.  In  this  algorithm,  each 
operation  of  a  given  job  shop  problem  is  considered  to  be  a  simple  agent  (or  processing  unit)  looking  for  a 
position  in  time  such  that  all  its  time  and  resource  constraints  are  satisfied.  To  obtain  this,  each  agent 
considers  the  current  time  position  of  its  constraint  neighbors  (the  other  agents)  to  gradually  change  its  own 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


656 


position.  As  distinct  from  sequential  local  search  methods,  all  agents  are  working  simultaneously  and 
therefore  form  a  highly  recurrent  dynamical  system  of  parallel  working  units,  which  should  self-organize 
after  a  while  to  a  feasible  schedule  but  may  fail  to  do  so  if  problem  constraints  becomes  too  high. 

Decreasing  the  allowable  maximal  makespan  will  gradually  increase  the  degree  of  constraining  and 
produce  better  and  better  schedules  as  long  as  the  self-organizing  algorithm  converges. 

Compared  with  the  neural  network  scheduling  approach  of  Zhou  et  al.  [6]  and  the  multi-agent  scheduling 
system  of  Liu  and  Sycara  [7],  our  distributed  framework  shows  some  kind  of  hybrid  flavor.  Our  processing 
units  are  neither  "intelligent"  enough  to  be  called  agents  (in  the  usual  sense),  nor  do  they  meet  the  ordinary 
definitions  of  "artificial  neurons".  Moreover,  although  information  in  our  framework  is  distributed  much 
like  signals  in  neural  networks,  they  cant  be  interpreted  simply  as  unit  activations.  However,  we  prefer  to 
call  the  whole  a  "connectionist"  algorithm  since  its  design  has  been  influenced  much  by  the  connectionist 
idea  of  simple  processing  units,  where  each  unit  is  responsible  for  one  local  piece  of  information  and  all 
units  act  simultaneously,  influencing  each  other  to  achieve  a  common  global  goal. 

Accordingly,  the  algorithm  has  been  implemented  using  CONNECT,  a  specification  language  for 
connectionist  systems,  which  has  been  originally  designed  for  the  development  of  artificial  neural 
networks.  CONNECT  allows  for  flexible  definitions  of  networks  of  simple  processing  units,  each  of  them 
communicating  with  the  others  by  sending  signals,  and  can  be  applied  to  design  neural  networks,  cellular 
automata  as  well  as  other  simple  distributed  systems.  Another  interesting  feature  of  CONNECT  is  that  it 
comes  together  with  a  graphical  user  interface  (GUI)  library  of  C++  modules,  which  easily  can  be  used  to 
visualize  the  progress  in  schedule  development  -  for  heuristic  approaches  this  is  an  important  point.  The 
complete  package  of  CONNECT  and  the  GUI  library  is  part  of  the  NeuroLution  system.  For  further  details 
see  [8,  9]. 


PROBLEM  DESCRIPTION 

We  start  with  a  formal  definition  of  the  classical  job  shop  scheduling  problem  as  used  in  this  paper.  Let  J 
be  a  set  of  n  jobs  and  R  be  a  set  of  m  resources  (machines).  Each  job  j,  e  J  consists  of  a  chain  of  m 
operations  Oy, . . . ,  oim,  such  we  are  given  a  total  ofn  x  m  operations  to  be  scheduled. 

Each  operation  Oy  requires  for  processing  a  resource  rv  e  R  and  will  block  this  resource  for  other  operations 
over  a  period  of  dy  e  N  time  units.  Any  two  different  operations  belonging  to  the  same  job  do  not  require 
the  same  resource  again,  i.e.  ry  *  rik .  The  operations  in  one  job  have  to  be  processed  in  their  given  order, 
defining  a  precedence  relation  — <  on  the  set  of  all  operations  O  as  follows: 

Oy  -<  ok]  o  i=  k  Aj  +  1  =  l. 

Our  goal  is  to  assign  to  each  operation  oy  a  discrete  starting  position  in  time  xy  e  N  out  of  some  given 
period  of  possible  positions  Xy  =  [by ,  ey  ].  We  call  the  vector  of  all  starting  positions  x  =  (xn,  .  .  .  ,  xnm)  a 
schedule  and  the  corresponding  domain  X  =  Xn  x  .  .  .  *  X„„,  the  set  of  all  schedules.  A  schedule  is  called 
feasible,  if  it  satisfies  the  following  constraints: 


Oy  ~<  Ok,  =>  Xy  +  dy  <  Xk,  1. 

ry  =  n,  =>  Xy  +  dy  <  Xk,  V  Xu  +  dk,  <  Xy  2. 

The  set  of  all  feasible  schedules  (problem  solutions)  will  be  denoted  by  L  £  X.  The  overall  processing  time 
(makespan)  of  a  (feasible)  schedule  is  given  by  the  function  x  (x)  =  max  ,  { xim  +  d„„ }  .  A  feasible  schedule 
x  is  called  optimal  if  it  minimizes  the  makespan  (Vie  I:t(i)  <x  (x)). 

SELF-ORGANIZING  TO  A  FEASIBLE  SCHEDULE 

In  our  distributed  framework  each  operation  Oy  of  a  given  problem  is  considered  to  be  a  simple  processing 
unit  (agent),  trying  to  find  some  value  for  the  local  variable  x-y  out  of  the  domain  Xy  such  that  constraints  1 


657 


and  2  that  apply  to  Xy  are  satisfied.  We  start  with  an  initial  (possibly  random)  start  position  x° i}  for  each 
operation  oy.  To  evaluate  the  current  position,  each  agent  needs  to  know  the  position  of  the  other  agents. 
We  therefore  have  to  distribute  this  information  among  agents  much  like  distributing  activations  in  neural 
networks.  To  be  more  precise,  an  agent  needs  to  know  just  the  position  of  its  constraint  neighbors,  which 
are  its  immediate  predecessor  and  successor  in  the  job  (oy./-<  o,y—<  o,y+i)  as  well  as  the  operations  being 
processed  on  the  same  machine  (oki :  r,y  =  rkt).  This  reduces  the  number  of  connections  since  each  agent  is 
connected  to  at  most  n  +  1  other  agents.  For  simplicity  we  denote  the  constraint  neighbors  of  some  agent  o,j 
with  Ojj  and  their  subrange  of  time  positions  with  xy . 

As  we  know  the  positions  of  our  constraint  neighbors,  we  can  evaluate  our  own  position.  For  this  purpose 
we  define  a  constraint  violation  measure  which  we  call  constraint  inconsistency. 


(  max(0,  min(x,y  +  dy,  xk,  +  dkj)  -  max(xv,  xkj)) :  r{j  =  rk, 

xkl)  =  -<  max(0,  x9  +  dy  -  xkl)  :  oy  -<  okt  3. 

I  max(0,  xk,  +  dk,  -  x0)  :  ok,  -<  o,y 

It  is  not  difficult  to  give  a  descriptive  interpretation  of  the  above  formula.  Assuming  that  Oy  and  Oy  have  to 
be  processed  on  the  same  machine  (r9  =  rk/)  they  block  this  machine  during  the  time  windows  [xj}  Xy  +  dy) 
and  ( xk/ ,  xk/  +  dkj).  If  those  windows  overlap  we  have  a  resource  conflict  since  a  machine  can  process  just 
one  operation  at  a  time.  We  therefore  calculate  this  overlapping  as  a  measure  of  constraint  violation. 
Something  similar  is  done  with  precedence  constraints  Oy  — <  okk  or  o*/— <  o,y  respectively.  Here  we  calculate 
how  many  time  units  we  have  to  shift  an  operation  to  the  left  or  to  the  right,  such  that  there  corresponding 
time  windows  are  in  the  desired  precedence  order. 

We  summarize  the  individual  constraint  inconsistencies  to  form  a  local  inconsistency  measure  for  the  time 
position  of  some  operation  o,y  (Eq.  4.).  Similarly  we  summarize  all  local  inconsistencies  to  form  a  global 
inconsistency  measure  for  the  schedule  formed  by  the  individual  time  positions  (Eq.  5.): 


Xj )  ^-  0  c{Xjj,  xki) 


</.,e  O.. 
kl  ij 


4. 


<)>  (x)  =  X  X  <t>  l(xij,  Xij  )  5- 

<=!  y=i 

Note,  since  the  constraint  inconsistency  is  a  non-negative  function  <t>c(x,y,  xkj)  >  0  the  same  yields  for  the 
local  and  global  inconsistency.  Moreover,  a  schedule  is  feasible  iff<j>(x)  =  0  holds  true,  which  in  turn  is 
equivalent  to  the  case  that  all  local  inconsistencies  are  zero.  Hence,  the  local  goal  each  agent  has  to  follow 
is  the  minimization  of  its  local  inconsistency  <|)/(x,y,  xy  ).  This  is  not  easy  to  achieve  since  an  agent  knows 
about  the  time  positions  of  its  constraint  neighbors,  but  cannot  change  them.  The  only  thing  it  can  do  is  to 
use  some  local  rule  x/+,y  =  a{x'y  ,  x  'y  )  to  adjust  its  own  position  in  time,  obtaining  a  (possibly)  better 
position  with  respect  to  the  current  situation.  Since  all  agents  are  doing  this  simultaneously,  we  end  up  in  a 
completely  new  situation  which  may  even  be  worse  then  the  previous  one.  But  this  is  not  a  drawback  since 
strict  "hill-climbing"  is  not  a  desired  feature  for  any  local  search  method.  Instead,  we  hope  that  "mistaken" 
moves  are  repaired  in  the  following  iterations,  such  that  local  and  global  inconsistency  minimization  is 
achieved  over  a  longer  period  of  iterations  finally  ending  with  a  feasible  schedule. 


The  thing  we  have  to  do  is  to  choose  the  local  update  rule  a  such  that  (1)  a  feasible  schedule  x,  i.e.,  a 
schedule  with  <\>(x)  =  0,  is  a  fixed  point  of  a  and  that  (2)  the  recurrent  dynamical  system x'+1  =  (j)  (x')  is  likely 
to  converge  to  such  a  fixed  point.  Currently  there  is  no  simple  or  analytical  way  to  do  so.  Of  course,  we 
could  choose  a  to  be  the  local  conflict  minimization  rule 

x'+1,y  =  arg  min  <> ,  (  %  Xy) 

Xy  £  JCy 

which  is  used  as  the  basic  update  rule  in  the  discrete  Hopfield  neuron  model  as  well  as  in  many  sequential 
local  search  algorithms.  However,  we  should  not  expect  this  to  work. 


658 


Fig.  1.  Progression  of  <|)  (x'j  using  ai  on  a  loosely  Fig.  2.  Progression  of  (j)  (x'j  using  a,  and  a2  on  a 

constrained  problem.  tighter  constrained  problem. 


For  example,  in  a  similar  connectionist  approach,  applied  by  the  authors  to  the  non-attacking  queens 
problem,  applying  local  conflict  minimization  simultaneously  had  the  undesired  effect  that  all  queens 
increased  the  number  of  conflicts  instead  of  decreasing  them.  However,  with  a  different  update  rule  we 
have  been  able  to  end  up  with  a  conflict-free  chess  board. 

The  update  rule  a  for  the  scheduling  problem  presented  here  has  been  found  empirically  and  should  not  be 
assumed  to  be  the  only  or  even  the  best  one.  The  results  obtained  with  this  rule  have  been  promising,  but 
further  investigations  into  the  subject  may  lead  to  better  rules  being  different  from  the  current  one.  Our  rule 
a  is  a  combination  of  two  independent  rules  a]  and  a2.  oq  is  a  simple  shifting  rule.  Let  ok,  be  a  constraint 
neighbor  of  Oj,  then  we  can  calculate  the  minimal  shift  necessary  to  obtain  a  position  satisfying  the 
constraint  as  follows: 

8  (Xij,  xk, )  =  J  ( -  max(0,  xv  +  dij  -  xk,) :  (o,y-<  okl)  v  (r0  =  rkl  a  xv  +  dv/  2  <  xk,  +  dk,/2  ) 

I  (  max(0,  Xij  +  dij  -  xkl) :  (t ov-<  okj)  v  (rv  =  rk,  a  xtj  +  dj/2  <  xk,  +  dk,/2)  6. 

We  summarize  all  shifts  to  form  a  cumulative  shift  which  is  finally  applied  to  obtain  a  new  position: 

a-\(Xj ,Xj)  =  ma x(b0 ,  minfc(> ,  x„  +  X  8(x,y,  xkj)))  7. 

o  «e  On 


Figure  1  shows  the  typical  progression  of  (J)  (x)  for  a  loosely  constrained  problem  where  all  agents  apply 
a!  simultaneously.  Although  the  function  seems  to  climb  up  and  down  unmotivated,  we  see  a  clear  process 
of  constraint  relaxation  culminating  in  finding  a  feasible  schedule.  Increasing  the  degree  of  constraint  on 
the  problem  may  lead  to  the  undesired  effect  of  getting  stuck  in  a  local  minimum  or  a  cyclic  repetition  of 
infeasible  schedules.  To  escape  from  this  situation  we  use  the  second  update  rule  a2  which  is  a  modified 
local  conflict  minimization  rule  based  just  on  the  resource  constraints: 

a2  =  arg  min  (  X  <t>  /  (  xv,  xkl) )  8. 

°u'ru~'i 


Since  a  local  agent  cannot  detect  whether  the  global  algorithm  has  been  trapped  in  a  local  minimum  or  a 
cycle  of  infeasible  schedules,  each  agent  uses  some  internal  strategy  to  combine  both  rules  in  the  intended 
manner.  Update  rule  cq  is  the  preferred  one  and  applied  most  of  the  time.  However,  if  an  agent  detects  that 
it  is  has  reached  no  conflict-free  position  for  a  longer  period  of  iterations,  it  applies  the  second  rule  to  jump 
to  another  region  of  the  search  space  to  escape  from  the  current  trap.  More  precisely,  each  time  where  a  j  is 
applied  the  local  inconsistency  is  accumulated,  and  when  the  accumulator  reaches  a  threshold  P  >  0, 
instead  of  ai  the  second  rule  a2  is  applied  once  and  the  accumulator  is  reset  to  zero.  The  effect  of  this 


659 


strategy  can  be  seen  in  Figure  2.  Around  iteration  475  the  algorithm  gets  trapped  in  a  local  minimum.  Some 
agents  cant  find  a  conflict-free  position  and  change  their  update  policy  some  80  iterations  later,  kicking  off 
the  relaxation  process  again  which  falls  into  a  feasible  schedule  sometimes  later.  A  smaller  (3  value 
encourages  agents  to  apply  a2  more  often,  hence  a  trap  situation  can  be  escaped  earlier.  On  the  other  hand, 
a  small  value  for  P  disturbs  the  inconsistency  relaxation  process  of  a,  too  much,  hence  p  has  to  be  chosen 
such  that  a  balance  between  both  processes  is  obtained. 


MAKESPAN  MINIMIZATION 

To  obtain  schedules  with  near-optimum  makespans  we  can  use  the  following  basic  procedure.  An  initial 
maximal  makespan  t„  being  large  enough  to  ensure  that  a  feasible  schedule  can  easily  be  found  is  chosen. 
The  earliest  and  latest  time  positions  for  each  operation  o,,  are  calculated  according  to  the  following 
formulas: 

j- 1  m 

by  ^  dik  &ij  t  -  d,k  9. 

*=■)  k=j 

The  position  of  each  operation  is  initialized  with  the  earliest  position  possible  *"„=  by.  Our  self-organizing 
algorithm  is  applied  and  a  first  feasible  schedule  will  be  found.  This  schedule  can  be  improved  by  moving 
each  operation  as  much  as  possible  to  some  earlier  position  without  changing  the  current  order  of 
operations  on  the  machines.  With  the  makespan  of  this  improved  schedule  x„  a  new  maximal  makespan  will 
be  calculated  T,  =  z(x0)  -  At  where  At  denotes  some  constant  makespan  reduction  value.  All  agents 
exceeding  this  new  makespan  calculate  a  new  position  according  to  update  rule  a2.  Another  run  of  the  self¬ 
organizing  algorithm  is  performed  and  these  steps  are  repeated  as  long  as  the  self-organizing  algorithm 
converges. 


EXPERIMENTAL  RESULTS 

We  applied  our  method  to  solve  some  10  x  10  benchmark  problems  taken  from  the  operation  research 
libraiy  (http://www.ms.ic.ac.uk/info.html).  Our  intention  was  to  come  as  close  as  possible  to  the  (known) 
optimum  schedule.  Since  our  algorithm  is  deterministic  in  its  current  version,  we  performed  several  runs 
with  different  parameter  settings.  We  varied  the  makespan  reduction  value  Ax  as  well  as  the  parameter  p. 
Table  1  summarizes  the  results,  showing  the  best  and  average  makespan  obtained  and  the  known  optimum 
for  each  problem. 

Each  run  took  about  9  seconds  on  a  233MHz  i686  PC,  the  best  schedule  usually  emerged  after  half  the 
time.  Taking  the  best  solution  found  for  each  benchmark,  the  generation  of  schedules  differing  3%  (on  the 
average)  from  the  optimum  seems  to  be  within  the  reach  of  our  approach. 


Table  1.  Experimental  results  on  a  set  of  1 0  x  10  benchmark  problems. 


abz5 

abz6 

ftlO 

lal6 

lal  7 

lal  8 

lal  9 

la20 

orbOl 

1234 

943 

930 

945 

784 

848 

842 

902 

1059 

1260 

976 

951 

982 

794 

862 

870 

922 

1121 

1295 

995 

989 

1012 

826 

905 

890 

932 

1171 

orb02 

orb03 

orb04 

orb05 

orb06 

orb07 

orb08 

orb09 

orb  10 

888 

1005 

1005 

887 

1010 

397 

899 

934 

944 

889 

1075 

1050 

916 

1033 

409 

934 

971 

979 

912 

1123 

1076 

954 

1056 

422 

955 

1005 

1000 

In  a  second  experiment  we  examined  whether  we  can  compete  with  other  algorithms.  A  set  of  3000  10  x 
10  job  shop  scheduling  problems  was  generated  and  solved  with  a  schedule  construction  algorithm  based 
on  constraint  logic  programming  (CLP)  known  to  produce  good  solutions  [3], 

Each  of  the  27  available  variable  ordering  heuristics  was  applied  and  the  best  schedule  obtained  was 
compared  with  the  solution  generated  by  our  algorithm.  We  made  two  runs,  one  with  the  parameters  AT  and 


660 


P  proven  to  be  the  best  to  solve  the  ftlO  benchmark  problem  and  a  second  modified  run,  additionally 
allowing  repetitions  based  on  random  re-initializations  after  self-organizing  has  failed. 

In  the  first  run  we  could  improve  the  CLP  results  in  42%  of  all  cases,  whereby  on  the  average  we  obtained 
comparable  solutions  (1.2%  deviation  towards  poorer  schedules).  In  the  second  run  we  could  improve  these 
values  to  65%  and  -1.25%  (towards  better  schedules),  which  shows  that  our  search  method  can  still  be 
improved.  Note  that  compared  with  each  individual  CLP  heuristic  our  algorithm  was  superior  in  terms  of 
robustness  and  generated  solution  quality,  which  has  been  between  5%  to  12%  for  our  set  of  3000  test 
problems. 


SUMMARY 

We  have  proposed  a  novel  distributed  framework  based  on  connectionist  ideas  of  information  processing  to 
solve  the  classical  job  shop  scheduling  problem.  Despite  its  simplicity,  our  framework  seems  to  be  a 
promising  area  for  further  research.  Experimental  results  on  standard  benchmark  problems  and  large 
problem  test  sets  have  shown  that  our  method  can  compete  with  other  methods  in  terms  of  generated 
solution  quality  as  well  as  computational  costs.  Whereas  solution  quality  is  on  the  average  not  superior  to 
other  methods,  the  schedules  are  obtained  very  fast  since  a  typical  run  of  the  algorithm  lasts  just  some 
seconds.  According  to  the  performance  comparison  in  [4],  we  may  well  have  an  algorithm  among  the 
fastest  local  search  algorithms  currently  available. 

Since  our  algorithm  is  simple,  intuitive  and  easy  to  implement,  it  can  be  seen  as  a  low-cost  alternative  to 
other  good  scheduling  methods,  based  on  extensive  and  expensive  software  tools  (e.g.  constraint  solvers). 
Being  a  connectionist  system  of  simultaneously  acting  agents  there  is  also  room  for  effective  parallelization 
especially  on  multi  processor  systems,  making  this  approach  attractive  to  real-time  applications. 


REFERENCES 

1.  D.P.  Williamson,  L.A.  Hall,  J.A.  Hoogeven,  C.A.J.  Hurkens,  Jan  Karel  Lenstra,  S.V.  Sevastjanov,  and 
D.B.  Shmoys,  1996.  Short  shop  schedules.  Operation  Research,  25. 

2.  S.  Breitinger  and  H.C.R.  Lock,  1994.  Improving  search  for  job-shop-scheduling  with  CLP(FD).  In  M. 
Hermenegildo  and  J.  Penjam,  editors,  Programming  Language  Implementation  and  LogicProgramming, 
vol.  844  of  Lecture  Notes  in  CS,  Springer,  277-291. 

3.  H.J.  Goltz  and  U.  John,  1996.  Methods  for  solving  practical  problems  of  job-shop  scheduling  modelled 
in  CLP(FD).  In  Proceedings  of  the  Conference  on  Practical  Application  of  Constraint  Technology, 
London,  73-92. 

4.  R.J.M.  Vaessens,  E.  H.  L.  Aarts,  and  J.  K.  Lenstra,  1996.  Job  shop  scheduling  by  local  search. 
INFORMS  Journal  on  Computing,  8(3),  302-317. 

5.  E.  Aarts  and  J.  K.  Lenstra,  eds.,  1997.  Local  Search  in  Combinatorial  Optimization.  John  Wiley  &  Sons. 

6.  D.  N.  Zhou,  V.  Cherkassky,  T.  R.  Baldwin,  and  D.  W.  Hong,  1990.  Scaling  neural  network  for  job-shop 
scheduling.  In  Proc.  of  the  Int.  Joint  Conference  on  Neural  Networks,  3,  889-894,  San  Diego. 

7.  J.-S.  Liu  and  K.  P.  Sycara,  1995.  Multiagent  coordination  in  tightly  coupled  real-time  environments.  In 
Victor  Lesser,  editor,  Proc.  of  the  Int.  Conference  on  Multi-Agent  Systems.  MIT  Press. 

8.  G.  Kock  and  T.  Becher,  1997.  An  integrated  environment  for  the  development  and  acceleration  of 
neuro-fuzzy  systems.  In  Australasia-Pacific  Forum  on  Intelligent  Processing  and  Manufacturing  of 
Materials  (IPMM97). 

9.  G.  Kock  and  T.  Becher,  1998.  Flexible  neuro-fuzzy  simulation  based  on  abstract  model  and  interface 
definitions.  Cybematics  and  System:  An  International  Journal  (CBS),  29(7),  689-714. 


661 


Fuzzy  Systems  II 


662 


663 


Designing  in  Many-Valued  Logic 

Antonio  Donnarumma;  Michele  Pappalardo 

Department  of  Mechanical  Engineering,  University  of  Salerno 
Via  Ponte  don  Melillo,  84084  Fisciano  ,  (Salerno),  Italy 


ABSTRACT 

The  analysis  followed  here  is  based  on  the  many-valued  logic  of  Lukasiewitcz.  It  is  finalised  up  to 
construction  of  a  simple  design  model  when  the  analysis  cannot  be  based  upon  a  two-valued  logic.  The 
reference  is  to  the  semantics  of  Kripke,  immersion  in  a  definite  possible  world,  and  to  the  process  of 
verification  and  confirmation  of  Carnap.  The  example  is  based  on  the  statistics  of  Dempster  and  Shafer. 


INTRODUCTION 

In  design,  the  uncertain  of  data  strongly  hinders  validation  of  mathematical  models.  Many  difficulties  exist 
in  the  recognition  of  the  truth  of  any  proposition..  This  study  is  finalised  up  to  the  construction  of  a  simple 
model  to  verify  data  when  the  analysis  cannot  be  based  upon  a  two-valued  logic.  The  model  cannot  be 
based  upon  two-valued  logic  since,  in  many  phases  of  a  project,  it  is  unclear  from  the  true-functional  point 
of  view  and  the  belief  in  a  concept  must  often  be  modified  on  the  basis  of  new  information.  We  will  refer  to 
the  semantics  of  Kripke,  immersion  in  a  definite  possible  world,  and  to  the  process  of  verification  and 
confirmation  of  Carnap  The  analysis  proposed  here  is  based  on  the  many-valued  logic  ofLukasiewicz.[l] 

According  to  N.  Wiener^  model  of  reasoning  [2]  it  is  fundamental  to  follow  information  flow  about  the 
action  in  progress,  since  it  is  possible  to  generate  feedback.  In  the  choice  of  the  action  to  take,  the  principle 
of  feedback  means  that  performance  is  compared  periodically  with  the  desired  results,  and  that  the  success 
or  failure  of  this  performance  modifies  future  actions.  Comparison  is  founded  on  the  measurement  of  the 
information.  The  information  is  based  on  the  value  of  probabilities.  In  every  design,  in  situations  where  the 
information  is  incomplete  and  the  truth  value  of  proposition  is  indeterminate  we  can  decide  what  is 
possibly  true  and  possibly  false  and  continue  to  work  on  basis  of  plausible  reasoning  as  iftheire  is  no  lack 
of  information.  Wienerk  model  of  reasoning  is  coherent  with  the  process  of  learning  developed  by  Bayes. 

Denoting  various  propositions  A,  B,  C,  etc.,  let  the  propositions  A  n  B  be  true  and  — iA  be  false  we  have 
Laplace’s  mathematical  representation  of  process  of  learning  [3](  Bayes  Theorem): 

p{a\B^c)=p{a\c\p{b\A  nc)/p(zl|c))  1. 

Bayes  Theorem  is  an  algorithm  for  updating  results  and  for  acquiring  new  information.  p(  A\C)  is  the  prior 
probability  for  A  when  we  know  only  C  and  p( A\l 3  n  C)  is  the  posterior  information  which  is  updated 

as  a  result  of  acquiring  new  information  B.  A  represents  the  hypothesis  under  analysis  and  C  represents 
what  we  know  about  A  (table  of  truth)  before  getting  B  (new  data).  The  problem  is  the  correct  analysis  of 
B  in  the  presence  of  uncertainty.  The  learning  process  is  based  on  probability.  It  is  fundamental  to  the 
interpretation  of  probability.  We  use  Carnap^  [4]  interpretation:  The  probability  of  a  statement  is  the 
degree  of  confirmation  that  the  empirical  evidence  gives  to  the  statement. 


DESIGN  PROBLEMS 

Probability  is  at  the  base  of  the  process  of  learning  and  its  values  can  be  only  positive.  A  process  of 
planning  must  run  in  finite  time  tD  =  ^  At,-  equal  to  the  sum  of  the  various  phases  At . 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


664 


Planning  which  uses  an  evolutionary  logic,  cannot  be  based  on  two  values  (true  or  false).  In  two-value 
logic,  the  data  that  cannot  be  verified  during  initial  planning,  would  be  discarded  definitively,  but  could  be 
used  in  the  next  phase.  Better  results  can  be  obtained  using  a  three-valued  logic  in  which  the  third  logical 
value  is  introduced  to  represent  values  not  verified  yet  as  true  or  false.  The  analysis  of  vague  and 
contradictory  data,  with  any  degree  of  reliability,  involves  the  use  of  evolutionary  methodobgies. 

To  represent  possibility  or  indeterminacy,  Lukasiewicz  proposed  a  third  value  of  truth-logic.  He  wrote  [5]: 

The  third  logical  value  may  be  interpreted  as  “possibility"  and  may  be  symbolised  as  ‘A. 

If  we  want  to  formulate  a  system  of  three-valued  logic,  we  have  to  supplement  the 
principle  concerning  0  and  1  by  the  principles  concerning  A. 

Table  1:  Tables  of  truth 


In  the  three- valued  logic  of  T  ,—tf  and  ‘A  ,  instead  of  the  two  classical  truth-values,  close  to  the  truth- 
values  of  True  and  False,  a  third  indeterminate  value  14  is  added.  The  value  lA  represent  an  indeterminate  or 
possible  value  of  truth.  The  truth  tables  are  shown  in  Table] . 

These  rules  are  coherent  with  all  the  rules  for  two  values  of  truth.  The  objective  is  to  capture  truth  in  the 
presence  of  incompleteness  and  to  exclude  ambiguity.  The  data,  if  true,  can  be  included  in  a  set  of  true  and 
verified  elements.  Updating  the  truth  set  allows  passage  from  one  belief  situation  to  another  up  to  the  end 
of  the  project. 

For  our  applications  we  will  assume  that  "possible  propositions"  are  such  that,  in  the  time  interval  A tp 
that  we  have  at  our  disposal,  they  cannot  be  verified  or  falsified. 


A  proposition  is  true  if: 

A  proposition  is  false  if: 

it  is  verified 

it  has  never  been  true  or  possible 

and  it  is  independent  of  time. 

and  it  is  independent  of  time. 

A  proposition 

is  possible  if: 

in  the  time  interval  At q  that  we  have  at  our  disposal, 

it  cannot  be  verified  or  falsified. 

Table  2:  Propositions 

The  data  X  =  {x,  }  can  be  incomplete  or  contradictory  with  beliefs  simultaneously  true  T(X)  and  false  F(X). 
In  the  presence  of  contradictory  elements  T(X)  and  T(— \X)  ,  it  is  necessary  to  verily  the  falsehood  of  the 

l 

contradictions  F(X  a—X)  otherwise  it  is  necessary  to  appraise  if  the  data  is  false  F(X)  or  vague  F(X2). 
From  the  data  we  can  have 


{T(X)  ,  T(-X)}s/{F{X  a^X)}  =>  T(X  a -X) 


2. 


665 


The  logical  procedure  of  analysis  can  be  defined  synthetically  as  follows: 

Analysis  of  the  data  {x}. 

If  the  data  is  false,  it  doesnt  serve  in  the  planning. 

If  the  data  contains  contradictions  T(X)  and  T(  — i  X)),  we  must  analyse  the  validity  of  F(X  a  —X) .  It 
is  necessary  to  verify  the  contradictions: 

i_ 

If({T(X)  ,T(-X)}v{f(X  A-^X)}=$  T(X  A-X))  then  F{x)  vel  F(X2)  3. 

If  the  proposition  produces  values  of  truth  then  data  passes  to  the  set  of  truth-data. 


LEARNING  USING  SHAFERS  RULE 

For  handling  truth,  in  its  conventional  sense  of  degree  of  plausibility  and  reliability  (but  also,  satisfaction) 
the  rules  of  Dempster  and  Shafer  combine  data  coming  from  different  origins.  The  intersection  of  sets  of 
information  (assuming  that  the  information  is,  by  definition,  true)  tends  towards  increasing  belief  to  a 
degree  of  reliability  for  which  a  hypothesis  is  definitely  confirmed  or  denied.  The  lack  of  convergence  of 
the  analysis  means  that  between  hypotheses  in  competition  there  is  not  real  competition.  It  will  be 
necessary  to  retry  with  another  set  of  hypotheses.  This  fact  is  framed  in  the  most  general  problem  that 
violates  both  the  law  of  excluded-middle  and  the  law  of  non-contradiction. 

As  an  example,  we  mention  the  process  of  choice  of  some  town  planners  in  a  large  Italian  city.  This 
concerns  the  planning  program  of  certain  areas  of  the  same  city,  for  use  as  a  tourist  zone  or  as  a  park  or  to 
be  simply  residential,  with  all  the  consequences  that  every  one  of  these  choices  involves  itself. 

For  this  purpose,  some  categories  of  experts  and  citizens  of  the  zone  (tourist  operators,  industrial  workers, 
etc.)  have  been  called  together  to  express  their  view,  which  are  held  in  account  by  various  factors,  personal 
demands,  business  affairs  etc.  The  results  of  the  investigation  have  been  elaborated  with  luzzy  techniques 
[6,7].  The  operation,  though  rather  complex,  can  be  performed,  in  our  opinion,  using  the  statistics  of 
Dempter  and  Shafer  (DS  theory)  [8]. 

For  handling  data  in  the  presence  of  uncertainty,  Shafer  refined  Dempsterk  theory  [9],  including  Bayesan 
probability  as  a  special  case,  and  introduced  the  belief  function  as  a  lower  probability  and  the  plausibility 
function  as  an  upper  probability.  Evidence  theory  affords  us  the  opportunity  to  combine  information 
coming  from  different  origins. 

Probabilities  are  apportioned  to  subsets  and  the  mass  V,.  can  move  over  each  element.  Let  the  finite  non 
empty  set  0  =  {x;  ,..xn  }  be  the  frame  of  discernment.  The  basic  probability  is  assigned  in  the  range  [0,l\ 

to  the  2®  subset  of  0  consisting  of  a  singleton  or  conjunction  of  singletons  of  elements  Xr  The  basic 
probability  is  a  function  which  assigns  a  weight  to  the  subset  such  that 

I,Aaem(A)  = 1  m(®)  =  0  4. 

The  lower  probability  Pt  ( Aj  )  is  defined  as 

P*(Aj)  =  'LAjcAm(Aj)  5- 

while  the  upper  probability  P*( A  - )  is  defined  as 

P*(Aj)  =  l-ZAj^m(Aj)  6. 

Without  respect  to  additivity,  the  m(  Ai )  values  are  independent  basic  values  of  probability  inferred  on 
each  subset  At .  The  belief  function  of  set  M  is  given  by 


666 


BeI(M)  =  '£AQMm(Ai) 


7. 


If  m,  and  m2  are  independent  basic  probabilities  from  independent  evidence,  and  {A,, }  and  {a2j\  are  the 
sets  of  focal  points,  then  the  theorem  of  Shafer  gives  the  rule  of  combination.  Let  tw,  and  m2  be  two 
independent  basic  probabilities  from  independent  evidence.  If  ml(  ^li )  m2(  ^2j  )  >  ^ 

then 


=  m(Ak)= 


__  ^ AlinA3j=A 


m,(AIi)m2(A2j) 


A ,  *<I> 


8. 


1  ~  'LAnnA,J*t>  ml  ( Ali  )™2  ( A2j  )  ’ 

with  m(<I>  )  =  0 ,  assign  the  basic  probability.  In  rules  m  =  mj  ®  m2  are  the  combination  of  basic 
probabilities. 


Let  planning  of  a  new  area  of  a  city  in  which  there  exists  three  typologies  of  installation  and  we  want  to 
program  an  adjoining  area  with  the  same  typologies  using  the  indication  of  the  citizens. 


Typology  Xj :  area  destined  to  park 
Typology  x2  :  area  destined  for  sport 
Typology  X j :  area  destined  for  residences 


The  initial  situation  is  constituted  by  mixed  areas  of  the  type  {x;},  |x2x5}  and  [x2x3x3]  with  the 
distribution  ml({park})  =  0.3  ,  m, {{sport, residence})  =  0. 5,  m ,({park, sport, residence})=  0.2  . 

Two  categories  of  citizens,  consulted  for  their  wishes  on  the  future  planning,  give  these  results:  First  result: 
m 2  {{sport,  residence })  =0.5,  m2  {park,  sport,  residence )  =  0. 6  ,  in  DS  theory  m2  ({x2 ,  X3  })  =  0. 4, 
m2(x)=0.6 

Second  result:  m3 {{residence})  =  0.7,  m3 {vpark, sport, residence )  =  0. 3  ,  in  DS  theory  m3 ({x; })  =  0. 7, 
m3{x)=0.3 


The  two  results  are  in  clear  contradiction  since  there  is  much  uncertainty.  Using  DS  theory  the  problem  in 
presence  of  uncertainty  can  be  analysed  as  follows: 

Let  X  =  {x1,x2,x3]  be  a  finite  set  and  consider  the  basic  probability  assignment  as  a  table  of  truth: 
ml{{x1})  =  0.3  ,  m1{{x2,x3})  =  0.5,  m,{x)=0.2  with  constraint  K=  >  0.25 . 

Step  1 :  Let  the  data  from  citizens  be :  m2 {{x2 ,x3 })=0.4,  m2{x)  =  0.6 

The  distribution  of  Shafer  gives: 

ml  ®  m2({xj})  =  0.20  45, 
ml  ©  m2{{x2,x3})=  0.6590  and 
m,®m2{x)=  0.1363 


with  a  final  distribution  m(xI)  =  0.1764,m(x2)  =  0.41176  and  m(x3 )  =  0.41 176 

The  value  of  ©  /n2  ({*/})  <0.25  it  is  not  compatible  with  the  constraint  K ,  so  in  this  analysis  step,  it  is 
not  possible  to  use  the  data  of  basic  probability  m2{{x2,x3})=0.5,  m2{x)=0.6  .  An  analysis  of  new  data 
is  necessary. 


667 


Step-2:  Let  the  data:  Wj({x;})  =  0. 7,  m3  (x)  =  0.3  and  so: 

ni]  ®  m3({xj})= 0.6769 , 

nij  ®  m3({x2,x3})  =  0.23077  ,  and 

m,  ®m3{x)  =  0.09231 

with  a  final  distribution  m(x1)  =  0.5434,m(x2)  =  0.2282  and  m(x3 )  =  0.2282 

The  value  of  mj  ©  rrij  ({xy  })  >0.25  which  is  compatible  with  constraint  K. 

Step-3:  Let  »^({x;})  =  mt  ©  m3({xl})  =  0.46 ,  mj({x2,x3})  =  m,  ®  m3 ({x2 , x3 })  =  0.24  and 
m4 (X)=  mt  ®  m3(x)  =  0.30 .  So  now: 

nij  ®m4  ({x  j })  =  0.5581 , 
m3  ©  m4  ({x2,x^  })= 0.3661  and 
m,  ®m4(x)= 0.0757 

with  a  final  distribution  m(x1)  =  0.4172, m(x2)  =  0.29114  and  m(x3)  =  0.29114 . 

The  value  of  nij  ©  m4({xj })  >0.33  which  is  compatible  with  constraint  K.  So  we  have  been  able  to 
rescue  the  information  m2({x2,x3})=0.5,  m2(x)  =  0.6  by  using  an  evolutionary  technique  where  the  data 
is  not  considered  to  be  false  but  possible  since  we  did  not  have  the  necessary  information  to  discard  it. 


CONCLUSION 

Models  must  foresee  some  choice  between  different  solutions,  or  adaptation  of  a  present  choice  for  the 
phenomenon  of  a  following  phase.  Problems  are  resolved  by  induction.  There  is  the  possibility  of  a  refusal 
of  an  approved  choice  and  of  a  return  to  the  preceding  phase  or  vice  versa.  This  addresses  the  insifficiency 
of  classical  logic  to  manage  the  complete  procedure.  Different  logic  have  been  proposed  that  foresee 
violation  of  the  excluded-middle  principle  and/or  that  of  non-contradiction.  A  fundamental  element  is  the 
evaluation  of  the  degree  of  reliability  (the  degree  of  truth)  of  the  information.  The  trfvalent  logic  of 
Lukasiewicz  has  been  used.  The  third  value  of  truth  expressed  as  !4  is  used  to  point  out  indefinite  elements. 
Between  the  possible  methods  to  check  the  truth  (verification  of  Wittgenstein,  verification  and 
confirmation  of  Carnap,  falsification  of  Popper,  etc.)  the  proposal  of  Carnap  appears  to  be  the  best. 


REFERENCES 

1.  Lukasiewicz  J,  1970.  Modal  Logic.  Polish  Scientific  Publisher.  Warzawa. 

2.  Wiener  N.,  1950.  Human  Use  of  Human  Beings,  Houghton  Mifflin  Company-  Boston. 

3.  Jaynes  E.T.,  1985.  Baysan  Methods:  General  Background,  in  Proceedings  Volume.  Maximum 
Entropy  and  Methods  in  Applied  Statistics.  J.H.  Justice  ,  Ed.,  Cambridge  University  Press;  1-25. 

4.  Logic,  Language,  and  the  Structure  of  Scientific  Theories:  Proc.  of Camap-Reichenbach  Centennial, 
University  of  Kostanz,,  May  1991  Pittsburgh:  University  of  Pittsburgh  Press;  University  ofKostanz. 

5.  Lukasiewicz  J.,  1970.  On  Three-Valued  Logic,  in  L.  Boroski  Ed.,  selected  Works  of  Jan  Lukasiewcz. 
Holland  Publishing  Company  Warsaw. 

6.  Zadeh  L.A.,  1965.  Fuzzy  sets  .  Information  and  Control  8 , 338-353. 

7.  Zimmerman  H.J.,  1985.  Fuzzy  Sets  Theory  and  its  Application,  Boston. 

8.  Shafer  G.A  ,  1976.  Mathematical  Theory  of  Evidence  -Princeton  University  Press. 

9.  Terano  T.,  Asai  K.,  Sugeno  M.,  1992.  Fuzzy  System  Theory  and  Applications.  Academic  Press  Inc. 
San  Diego  CA. 


668 


669 


Modulus  Genetic  Algorithm  and  its 
Application  to  Fuzzy  System  Optimization 

Sinn-Cheng  Lin 

Department  of  Educational  Media  and  Library  Sciences, 

Tamkang  University,  Taipei,  Taiwan,  R.O.C. 

ABSTRACT 

The  conventional  genetic  algorithm  encodes  the  searched  parameters  as  binary  strings.  After  applying  the 
basic  genetic  operators  such  as  reproduction,  crossover  and  mutation,  a  decoding  procedure  is  used  to 
convert  the  binary  strings  to  the  original  parameter  space.  As  the  result,  such  an  encoding/decoding 
procedure  leads  to  considerable  numeric  errors.  This  paper  proposes  a  new  algorithm  called  modulus 
genetic  algorithm  (MGA)  that  uses  the  modulus  operation  to  resolve  this  problem.  In  the  modulus  genetic 
algorithm,  the  encoding/decoding  procedure  is  not  necessary.  It  has  the  following  advantages:  1)  the 
evolution  can  be  speeded  up;  2)  the  numeric  truncation  error  can  be  avoided;  3)  the  precision  of  solution 
can  be  increased. 

The  proposed  MGA  is  applied  to  resolve  the  key  problem  of  fuzzy  inference  systems  -  rule  acquisition. 
The  fuzzy  system  with  MGA  as  learning  mechanism  forms  an  intelligent  fuzzy  system”.  Based  on  the 
proposed  approach,  the  fuzzy  rule  base  can  be  self-extracted  and  optimized.  Such  an  intelligent  fuzzy 
system  has  a  general-purpose  architecture.  It  can  be  applied  to  many  kinds  of  fields. 


INTRODUCTION 

Genetic  algorithms  (GAs)  are  parallel  and  global  search  techniques,  which  take  the  concepts  from  evolution 
theory  and  natural  genetics.  They  emulate  biological  evolution  by  means  of  genetic  operations  such  as 
reproduction,  crossover  and  mutation.  Usually,  genetic  algorithms  are  used  as  optimization  techniques  [1]- 
[5],  Although  there  is  no  necessary  and  sufficient  condition  on  the  functions  which  are  optimizable  by 
genetic  algorithms,  it  has  been  shown  that  GAs  perform  well  on  multimodal  functions  (the  functions  which 
have  multiple  local  optima).  Moreover,  various  studies  have  shown  that  whenever  GAs  failed  to  find  the 
optimal  solution  on  a  function,  other  known  techniques  failed  as  well  [2]. 

Conventionally,  a  genetic  algorithm  works  with  a  set  of  artificial  elements  (binary  strings,  e.g.  10101010), 
called  a  population.  An  individual  (string)  is  referred  as  a  chromosome,  and  a  single  bit  in  the  string  is 
called  a  gene.  GA  generates  a  new  population  (called  offspring)  by  applying  the  genetic  operators  to  the 
chromosomes  in  the  old  population  (called  parents).  An  iteration  of  genetic  operation  is  referred  as  a 
generation.  A  fitness  function,  i.e.  the  function  to  be  maximized,  is  used  to  evaluate  the  fitness  of  an 
individual.  One  of  the  important  purposes  of  GAs  is  to  reserve  the  better  schemata,  i.e.  the  patterns  of 
certain  genes,  so  that  the  offspring  may  yield  higher  fitness  than  their  parents.  Consequently,  the  value  of 
fitness  function  increases  from  generation  to  generation.  In  most  of  genetic  algorithms,  mutation  is  a 
random-work  mechanism  to  avoid  the  local  optimum  trapping  problem.  As  a  result,  GAs,  theoretically,  can 
find  the  global  optimal  solution. 

The  basic  disadvantage  of  the  conventional  genetic  algorithm  is  that  it  encodes  the  searched  parameters  as 
binary  strings.  After  applying  the  basic  genetic  operators  such  as  reproduction,  crossover  and  mutation,  a 
decoding  procedure  has  to  be  used  to  convert  the  binary  strings  to  the  original  parameter  space.  As  the 
result,  such  an  encoding/decoding  procedure  leads  to  considerable  numeric  errors.  This  paper  proposes  a 
new  algorithm  called  modulus  genetic  algorithm  (MGA)  that  uses  the  modulus  operation  to  resolve  this 
problem.  In  the  MGA,  the  encoding/decoding  procedure  is  not  necessary.  It  has  the  following  advantages: 
1)  the  evolution  can  be  speeded  up;  2)  the  numeric  truncation  error  can  be  avoided;  3)  the  precision  of 
solution  can  be  increased. 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


670 


The  MGA  is  applied  to  resolve  the  key  problem  of  llizzy  logic  systems  -  rule  acquisition  [8],  The  fuzzy 
system  with  MGA  as  learning  mechanism  forms  an  intelligent  fuzzy  system”.  Based  on  the  proposed 
approach,  the  fuzzy  rule  base  can  be  self-extracted  and  optimized.  Such  an  intelligent  fuzzy  system  has  a 
general-purpose  architecture.  It  can  be  applied  to  many  kinds  of  fields,  such  as  fuzzy  control,  fuzzy  image 
processing,  fuzzy  decision  making,  and  fuzzy  pattern  recognition  . . .  etc. 

This  paper  is  organized  as  follows:  Section  1  is  an  introduction.  Section  2  describes  the  detail  of  the  MGA. 
In  Section  3,  the  MGA  is  used  to  build  an  intelligent  fuzzy  inference  system.  Section  4  applied  the  MGA- 
based  fuzzy  inference  system  to  the  field  of  fuzzy  control.  Conclusions  are  drawn  in  Section  5. 


THE  MODULUS  GENETIC  ALGORITHM 

Reproduction  of  MGA 

The  Darwinian  "survival  of  the  fittest"  is  the  underlying  spirit  of  reproduction.  This  operation,  actually,  is 
an  artificial  version  of  natural  selection. 


Let  F  be  the  fitness  function,  and  F;  denote  the  value  of  fitness  function  associated  with  the  individual 
string  i  in  the  current  population.  Reproduction  is  a  process  in  which  individual  strings  in  the  current 
population  are  copied  according  to  their  fitness  function  values  F, .  A  higher  F  value  indicates  a  better  fit 


(or  larger  benefit).  To  perform  reproduction,  first,  Ft ' s  are  calculated.  Next,  the  current  individual  strings 
are  probabilistically  selected  and  copied  into  a  mating  pool  according  to  their  fitness  value.  The 
arrangement  allows  the  strings  with  a  higher  fitness  to  have  a  greater  probability  of  contributing  a  larger 
amount  of  offspring  in  the  new  population.  The  easiest  way  of  implementing  a  reproduction  operator  is  to 
create  a  biased  roulette  wheel.  The  slot  size  of  it  is  proportion  to  the  fitness  value  of  each  individual  in  the 
current  population.  Let  pst  denote  the  probability  of  selection  of  the  individual  i,  and  M  be  the  population 
size,  then  an  individual  string  will  get  selected  with  the  following  probability: 


Crossover  of  MGA 

Crossover  provides  a  mechanism  for  individual  strings  to  exchange  information  via  a  probabilistic  process. 
Once  the  reproduction  operator  is  applied,  the  members  in  the  mating  pool  are  allowed  to  mate  with  one 
another.  The  binary-coded  GA  takes  the  following  step  to  accomplish  the  crossover:  First,  two  parents  are 
randomly  selected  from  the  mating  pool.  Then,  a  random  crossover  point  is  picked  up.  Finally,  exchange 
the  parents'  genetic  codes  (binary  digits)  following  the  crossover  point.  This  random  process  provides  a 
highly  efficient  method  to  search  the  string  space  to  find  a  better  solution. 

In  MGA,  the  parameters  lie  in  the  original  space  rather  than  binary  space.  Hence,  the  crossover  operation 
has  to  be  modified  to  work  with  parameters  themselves  rather  than  their  binary  codes. 

Let  {a,  b}  and  {a',  b'}  be  the  parent  and  offspring  parameter  pair,  respectively.  The  search  space  of  them  is 
in  the  range  of  [X^ ,  Xm3x  jell,  The  crossover  of  MGA  is  proposed  as  follows: 

a'=(a-a0  +b0)  MOD  A  +  Amin 

b'=(b-b0+a0)  MODA  +  Arain  2‘ 

where  MOD  means  the  modulus  operator.  It  is  the  reason  why  the  proposed  approach  called  “modulus” 
genetic  algorithm.  The  meaning  of  other  notations  in  (2)  are:  A  =  -  Xmm  ,  and 

a0  =  a  MODaA 
b0  =b  MODaA 

in  which,  a  €  [0, 1]  is  called  the  crossover  factor. 


671 


Especially,  the  crossover  of  a  binary-coded  GA  is  a  special  case  of  (2)  with  the  following  a0  and  b0 : 
a0=a  MOD  2C 
bQ  =  b  MOD  2C 

where  c  denotes  the  bit  number  of  crossover  point. 

Mutation  of  MGA 

In  the  genetic  algorithm,  the  mutation  operation  introduces  new  genes  into  the  populations  such  that  the 
problem  of  trapping  in  local  optimal  points  may  be  avoided.  The  gene  of  individual  is  subject  to  a  random 
change  with  probability  of  the  pre-assigned  mutation  rate.  In  the  binary-coded  case,  a  mutation  operator 
changes  a  bit  from  0  to  1  or  vice  versa.  In  MGA,  mutation  is  a  random  work  mechanism.  It  simply  replaces 
a  parameter,  say  a,  with  an  arbitrary  value,  say  a\  in  the  search  space  [Amin ,  Xmax  ] . 


MGA-BASED  INTELLIGENT  FUZZY  SYSTEM  DESIGN 

Suppose  that  a  fuzzy  inference  system  described  by  the  following  rule  base  [6]: 

Rj  :  IF x  is X j {m j, <5  j)  THEN y  is  Yj (q>y)  3 . 


where  j  =  1, 2,  K  ,  N  ,  and  N  is  the  number  of  rules;  XI  s  and  IT 's  are  the  input  and  output  linguistic 
labels  [10],  respectively.  Especially,  in  this  paper  XI s  are  simply  assigned  as  Gaussian-shaped  functions, 


i.e..  (x)  -  exp 


,  and  Jf's  are  assigned  to  be  fuzzy  singletons,  i.e.,  pt  (u)  - 


{1>T=(py 

|0,y*<py ' 


Suppose  that  the  singleton  fuzzification  and  the  weighted  average  defuzzification  methods  are  applied  [8], 
then  the  output  of  (3)  is  given  by: 

y  =<p'p(x)  4. 

T  j  Uj.  (x) 

wherecp  =[(pl,«p2,...,<pJV]  and  p  =  [p„  p2,..„  p„]  in  which  py(x)  =  -r^ - 

/=1 

Constructing  a  parameter  space  to  be  searched  by  MGA  required  transferring  the  fuzzy  rule  base  (3)  to  a 
parameter  representation.  Clearly,  the  output  of  the  rule  base  (3)  is  uniquely  determined  by  a  set  of 
parameters  which  is  unionized  by  the  parameters  of  IF  part  and  THEN  part.  Hence,  the  parameter  vector  to 
be  learned  by  genetic  algorithm,  0  ,  is  defined  as: 

0  =[wTq/(pT]T  =[«i,  m2A  mN  (Jj  o2  A  <5^  9i  *P2  A  <P;v]T  5. 

Assume  that  Xm  ,Xa,X(f  are  the  search  space  of  nij '  s,  a  }  '  s,  <py '  s ,  respectively;  M  is  the  population  size; 

h  is  the  number  of  generation.  The  details  of  learning  procedures  of  MGA-based  fuzzy  system  are 
described  in  the  follows. 

STEP  1:  Initially,  set  h  =  0  and  randomly  generate  3M  initial  parameter  vectors, 
m('\h)  =  [m{U)(h)  m2(,) {h) A  mNU){h)\J 

au\h)  =  [o1,0(A)o2(0(A)a  a/W 
<P <0  (A)  =  [<Pi(,)  (A) (P2(,)  (A) A  <p/W 

where  /ny(0(/i)e  Xm,o/‘\h)s  Xa  andip y(,,(A)e  ^  (/=  1,2,  ...,MJ=  1,2, ...,  N). 

If  the  i-th  candidate  of  MGA-based  fuzzy  inference  system  is  denoted  by  FI  S'11 .  Then  the  fuzzy  rule  base 
of  FIS(,)  can  be  created  as: 


672 


J?/  :  IFxis (im,  ,a,  )THEN/isI’r,  (tp,  ) 
FIS(,)  .  _  R2U)  :  IF  x  is  Jf  2  (i)  (/w2  (i) ,  a  2  <f) )  THEN  /  is  F2(i)  (tp  2<0  ) 


[Rn0)  ■  IF x  is  XNU) (mNu) ,o N{i> )  THEN  y  is  YN0> (<p A,(,) ) 

where  XA  'sand  Yj0) 's  are  linguistic  labels  to  be  learned,  and  zw;(,)  's,a><,)  'sand(p  -(,) 's  are  their 
parameters. 

STEP  2:  Construct  the  parameter  vector  of  the  z'-th  individual, 

Q_ii)(h)  =  [Qla\h) A  e"(h)MN+1u\h) A  e2Ar(i,(/I)i«2iV+l<'>(^)A  e3Ar<')(/z)]7' 

=  [mu)T  (h)W0)T  (/z)Np (,,r  (/i)]7 


STEP  3:  Establish  the  population  in  the  generation  h,  P{h), 

p(h)=  {em(h),e(2\h),  k  ,ew(/z)} 


•STEP  4:  Evaluate  the  fitness  value,  F(,) ,  of  each  individual. 

STEP  5:  Apply  the  modulus  genetic  operators,  i.e.  reproduction  of  MGA,  crossover  of  MGA  and  mutation 
of  MGA,  to  generate  a  new  population  P(h+\),  which  is  always  known  as  the  offspring  of  P(h). 

STEP  6:  Keep  the  elitist.  That  is,  1)  pick  up  the  best  fitted  individuals  in P(h)  and  P(h+\);  2)  compare  their 
fitness;  3)  if  the  best  individual  of  P(h)  has  a  better  fitness  value  than  that  of  P(h+ 1 ),  then  randomly  replace 
an  individual  in  P(h+ 1)  with  the  elitist. 

STEP  7:  Use  the  parameters  to  calculate  the  output/  of  the  fuzzy  system. 

STEP  8:  Set  h  =  h  +  1;  go  to  Step  2  and  repeat  the  procedure  until  F  >  FM  or  h>  H  .  Where  FM  and  FI 
denote  an  acceptable  fitness  value  and  a  stop  generation  number,  respectively,  as  specified  by  the  designer. 


AN  APPLICATION  EXAMPLE 

The  proposed  MGA-based  intelligent  fuzzy  system  has  a  general-purpose  architecture.  It  can  be  applied  to 
many  kinds  of  fields,  such  as  fuzzy  control,  fuzzy  image  processing,  fuzzy  decision  making,  and  fuzzy 
pattern  recognition  ...  and  so  on.  In  this  section,  an  example  of  MGA-based  fuzzy  control  system  is  used  to 
demonstrate  its  practicability.  Consider  a  class  of  nth  order  nonlinear  systems,  which  is  expressed  by  the 
following  error  dynamics  [12]: 

'^=*2 

Jfc  =x,  , 

■  6. 

M 

A  =  fix) +g(x)u 

where  x  =  [x,  x2  K  x„]T  e  9?”  is  the  state  vector;  u  e  91  is  the  control  input;/.)  is  an  unknown  continuous 
function  with  known  upper  bound,  i.e.  \  f\<  fv ;  g(.)  is  an  unknown  positive  definite  function  with  known 
lower  bound,  i.e.  0  <  gL  <  g .  Actually,  equation  (6)  represents  a  general  uncertain  nonlinear  dynamic 
system.  The  chief  objectives  are: 

1)  Apply  MGA  for  self-extracting  an  optimal  fuzzy  control  rule-base,  to  minimize  the  following 
quadratic  cost  function: 

J=\[M  ( t)Qx(t )  +  uT(t)Ru(t))dt 

where  Q  e  9?"x"  and  R  e  91  are  two  positive  definite  weighting  matrices. 


7. 


673 


2)  Guarantee  the  stability  of  the  control  system: 

|x,- 1<  5,,  /  =  1, 2, « 


8. 


To  simplify  the  system  design,  the  fuzzy  sliding  mode  control  (FSMC)  [12]  is  adopted  as  the  control 
scheme.  Based  on  our  previous  work  [14],  the  control  law  can  be  represented  as: 

f  1,  for  |  s  |>  J 


u  =  (1  -a)uf  +auh,  a  = 


0,  for  |  s  \<  s 


where  uf  is  obtained  from  the  following  fuzzy  control  rule-base: 

Rj  :  IF  s  is  Sj  ( wy.,o  } )  THEN  u  f  is  U j  (cp j ) 


9. 


10. 


where  s(x)  =  cTx  =  Yc;x,  is  a  sliding  function  and  cT  =  [c,  c2  K  c„]T  6  91”  is  the  coefficient  vector  of  5. 
1=1 

The  optimal  coefficients  of  sliding  function  can  be  determined  by  the  criterion  we  proposed  in  [14]. 


Notice  that  the  rule  base  (10)  is  in  the  form  of  (3).  Therefore,  the  approach  described  in  the  previous  section 
can  be  directly  applied  to  find  the  parameter  vector  of  (10),  that  is  [m,  m2  a  mN  a,  o2  a  a N  9,  (p2  A  (pA  ]r . 
To  minimize  the  quadratic  cost  function  (7),  the  fitness  function  can  be  defined  as: 


F  = 


K 

in  which  Js -ts  +^(s2  +Ru2) .  Where  ts  denotes  the  reach  time  of  sliding  mode;  k  =  int(t/At)  denotes 

k= I 

the  iteration  instance;  At  is  the  sampling  period;  int(.)  is  the  round-off  operator;  K  =  int(/max  /At)  denotes 
the  number  of  iterations  in  each  run;  is  the  running  time  in  one  run. 


Moreover,  the  hitting  control  law  in  (9),  uh ,  is  designed  to  guarantee  system  stability.  If  uh  given  by  [13]: 

uh  =  -s/g/ifs)^1  (fu  + 1  cTx  |  +n )]  11 


in  which  c=[c,  c2  K  c„_,]Tand  5c  =  [jc,  x2K  Then  the  sliding  condition, v&< -r\  |s|,  is  satisfied  as 

|  s  |>  J ,  and  the  control  system  is  stable  in  the  sense  that  all  system  states  x,.  (i=  1,  2,...,  n)  are  bounded  by: 


|x,.(t)|< 


For  example, 


y=i 


consider  an  underwater  vehicle  whose 


simplified  model  is  represented  as  [1 1]: 


U=x2 

1*2  =-f^hl +t,u 

where  X, ,  X2  represent  the  position  error  and  velocity  error  of  the  vehicle,  respectively;  u  is  the  control 
force;  m  is  the  mass  of  the  vehicle;  d  denotes  the  drag  coefficient.  The  parameter  values  that  used  in  [1 1] 
are  also  adopted  in  the  following  simulations,  i.e.  /w  =  3  +  1.5sin(|x2  1 1)  and  d  =  \.2  +  0.2sin(|  x,  1 1) . 


Suppose  that  the  weighting  matrices  are  selected  as  Q  = 


2 

0.5 


0.5 

1 


,R  =  0.1;  the  population  size,  the 


crossover  rate  and  the  mutation  rate  are  selected  as  10,  0.8  and  0.03,  respectively.  Based  on  [14],  the 
optimal  coefficients  of  sliding  function  can  be  derived  as  c=[l.4142  l]T  .  Six  fuzzy  rules  are  created  in 
this  simulation.  Fig.  1  shows  the  evaluation  result  of  cost  function  and  Fig.  2  shows  the  final  state  space 
response  after  learning. 


674 


CONCLUSION 

In  this  paper,  a  new  approach  called  modulus  genetic  algorithm  was  described.  The  numeric  error,  which 
arisen  by  the  encoding/decoding  procedure  in  conventional  GAs,  was  avoided.  In  the  modulus  genetic 
algorithm,  the  encoding/decoding  procedure  is  not  necessary.  It  has  the  following  advantages:  1)  the 
evolution  can  be  speeded  up;  2)  the  numeric  truncation  error  can  be  avoided;  3)  the  precision  of  solution 
can  be  increased. 

The  MGA  was  applied  to  resolve  the  key  problem  of  fuzzy  logic  systems  -  rule  acquisition.  The  fuzzy 
system  with  MGA  forms  an  “intelligent  fuzzy  system”.  Based  on  the  proposed  learning  step,  the  fuzzy  rule 
base  can  be  self-extracted  and  optimized.  Such  an  intelligent  fbzzy  system  has  a  general-purpose 
architecture.  It  can  be  applied  to  many  kinds  of  fields,  such  as  fuzzy  control,  fuzzy  image  processing,  fuzzy 
decision  making,  and  fuzzy  pattern  recognition,  etc. 


REFERENCES 

1 .  Back,  T.,  1993.  Optimal  mutation  rates  in  genetic  search,  Proc.  5th  Int.  Conf.  on  GAs,  2-8. 

2.  Goldberg,  D.E.,  1989.  Genetic  Algorithms  in  Search,  Optimization,  and  Machine  learning,  Addison- 
Wesley. 

3.  Goldberg,  D.E.,  1991.  Real-coded  genetic  algorithms,  virtual  alphabets,  and  blocking,  Complex 
Systems,  5,  129-167. 

4.  Karr,  C.L.,  1991.  Applying  genetics  to  fuzzy  logic,  AI  Expert,  March,  38-43. 

5.  Karr,  C.L.,  1991 .  Genetic  algorithms  for  fuzzy  controller,  AI  Expert,  February,  26-33. 

6.  Lin,  S.C.,  Chen,  Y.Y.,  1997.  Design  of  self-learning  fuzzy  sliding  mode  controller  based  on  genetic 
algorithms,  Fuzzy  Sets  and  Systems,  86,  139-153. 

7.  Tate,  D.M.,  Smith,  A.E.,  1993.  Expected  allele  coverage  and  the  role  of  mutation  in  genetic 
algorithms,  Int.  Conf.  on  Genetic  Algorithms,  31-37. 

8.  Wang,  L.X.,  1994.  Adaptive  Fuzzy  Systems  and  Control,  Englewood  Cliffs,  NJ:  Prentice  Hall. 

9.  Zadeh,  L.A.,  1965.  Fuzzy  sets,  Information  and  Control,  8,  338-353. 

10.  Zadeh,  L.A.,  1973.  Outline  of  a  new  approach  to  the  analysis  of  complex  systems  and  decision 
processes,”  IEEE  Trans,  on  Systems,  Man  and  Cybernetics,  (SMC3),  28-44. 

1 1 .  Lewis,  F.L.,  1992.  Applied  Optimal  Control  and  Estimation,  Englewood  Cliffs,  NJ:  Prentice  Hall. 

12.  Lin,  S.C.,  Chen,  Y.Y.,  1997.  Design  of  self-learning  fuzzy  sliding  mode  controller  based  on  genetic 
algorithms,  Fuzzy  Sets  and  Systems,  86(2). 

13.  Lin,  S.C.,  Chen,  Y.Y.,  1995.  Design  a  hitting  controller  to  stabilize  the  fuzzy  sliding  mode  control 
systems,  Proc.  Int.  Joint  Conf.  CFSA/IFIS/SOFT’95  -  Fuzzy  Theory  and  Apps.,  Taipei,  374-379. 

14.  Lin,  S.C.,  1997.  Stable  Self-Learning  Optimal  Fuzzy  Control  System  Design  and  Application,  Ph.D. 
Dissertation,  National  Taiwan  University. 


675 


FUZZY  EVOLUTIONARY  PROGRAMMING 
FOR  PORTFOLIO  SELECTION  IN  INVESTMENT 

Tu  Van  Le 

School  of  Computing 

University  of  Canberra,  ACT  2601  Australia 
Email:  vanle@ise.canberra.edu.au 

ABSTRACT 

The  problem  of  portfolio  selection  in  investment  is  concerned  with  minimizing  the  risk  for  a  prespecified 
level  of  return.  In  this  paper,  the  constraint  on  the  level  of  return  is  fuzzified  and  the  technique  of  fuzzy 
evolutionary  programming  is  used  to  select  an  optimal  portfolio  of  securities  with  low  risk  and  a  highly 
acceptable  level  of  total  return.  Experimental  results  show  the  method  is  highly  effective.  The  problem  to 
select  a  portfolio  with  low  risk  and  high  probability  of  expected  return  is  resolved  in  the  same  manner. 

INTRODUCTION 

Portfolio  selection  is  a  crucial  task  in  investment.  An  investor  is  faced  with  a  choice  from  among  an 
enormous  number  of  securities.  The  decisions  on  which  securities  to  invest  in  and  on  the  proportions  of 
investment  in  those  securities  are  highly  complicated.  The  two  principal  factors  that  concern  a  portfolio 
selection  are  its  risk  and  its  expected  total  return.  As  broadly  acknowledged,  most  investors  hold  strong 
aversion  to  risk  and  would  rather  accept  a  modest  return  for  low  risk.  Therefore,  Markowitz  [4]  has 
formulated  the  problem  of  portfolio  selection  into  minimizing  the  risk  provided  that  its  expected  total  return 
is  above  some  prespecified  threshold. 

So,  portfolio  selection  is  a  constrained  optimization  problem.  Theoretically,  this  problem  can  be  resolved 
mathematically  by  solving  systems  of  quadratic  equations.  In  practice,  however,  due  to  the  large  number  of 
securities,  a  mathematical  approach  requires  a  great  deal  of  programming  computation  [2,  5].  Recently, 
Watanabe  et  al.  [6]  employed  a  Boltzmann  machine  to  solve  the  portfolio  selection  problem  by  maximizing 
the  energy  function:  Expected  Return  -  Risk.  In  reality,  for  most  investors,  the  constraint  on  expected  total 
return  is  tolerable,  particularly  in  the  case  where  investment  risk  is  unexpectedly  too  high  and  needs  to  be 
lowered.  Thus,  one  may  relax  the  constraint  and  allow  it  to  be  satisfied  to  a  certain  degree  in  order  to  find  a 
compromised  optimal  solution.  In  [3],  I  have  presented  an  evolutionary  approach  to  solve  this  type  of 
constrained  optimization  problem  based  on  a  combination  of  fuzzy  logic  and  evolutionary  programming.  In 
this  paper,  the  method  of  fuzzy  evolutionary  programming  is  employed  to  solve  the  portfolio  selection 
problem,  according  to  which  the  degree  of  satisfying  the  total  return  expectation  is  used  as  a  weight  factor 
for  potential  solutions. 

PORTFOLIO  SELECTION  IN  INVESTMENT 

Consider  N  securities  having  expected  returns  r,  with  varianceso ,  and  covariances  a  y ,  0  <  i,  j  <  N.  The 
portfolio  selection  problem  is  expressed  as 

N  N  N 

Minimize  +  X 

/=1  i=l 

N 

Subject  to  ^  xt  =  1,  1 . 

i=i 

fx,.r,.>£,  2. 

i= 1 

xi>0,i  =  \,...,N  3. 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


676 


where  xt  is  the  fraction  of  funds  invested  in  the  rth  security,  and  £  is  a  prespecified  expected  total  return. 

Here,  we  consider  the  constraint  on  the  expected  total  return  as  tolerable.  That  is,  the  investor  would  be 
completely  happy  if  the  total  return  is  greater  than  or  equal  to  E,  but  it  is  still  acceptable  to  some  degree  if 
this  total  return  is  below  the  expectation  E.  Thus,  condition  2.  on  the  previous  page  is  replaced  with  a  fuzzy 
constraint 

IT|><£,e)(L!W;) 

where  the  fuzzy  set  <|>(a,£)  is  defined  by 

1  if  x  >  a 

,  i  *  j  -e-p 

-  if  a -£  <x<a 

\—e~p 

0  otherwise 

where,  p  is  a  parameter  used  to  control  the  shape  of  the  membership  function.  Intuitively,  we  allow  a  high 
degree  of  tolerance  if  £ x,r,  is  smaller  than  E,  but  close  to  E,  and  we  make  this  tolerance  decrease 
rapidly  as  the  gap  increases. 

The  portfolio  selection  problem  is  now  redefined  as  follows,  where  for  simplicity  we  use  the  notationa  „■  to 
denote  the  variance  a,2 . 

N  N  N 

Maximize  (1  -  XXw  ij  (Xv,) 

i=l  j= 1  /=! 

N 

Subject  to  £*,=1, 

i=i 

x,.  >  0,t  =  l, ...,N 

Thus,  the  degrees  of  constraint  satisfaction  are  used  as  weight  factors  of  potential  solutions.  As  the 
objective  function  in  this  problem  is  not  differentiable  everywhere,  the  mathematical  method  is 
inapplicable.  A  fuzzy  evolutionary  programming  approach  to  solving  this  problem  is  presented  in  the  next 
section  to  deal  with  this  problem. 

FUZZY  EVOLUTIONARY  PROGRAMMING  FOR  PORTFOLIO  SELECTION 

Consider  the  constrained  optimization  problem  of  portfolio  selection  as  presented  in  the  previous  section. 
The  following  evolutionary  algorithm  is  employed  to  solve  the  problem.  In  this  algorithm,  p  denotes  a 
sigmoid  function  that  compresses  the  positive  real  line  into  the  interval  [0,1],  and  the  expression 

represents  a  random  number  between  0  and  1,  with  rather  high  probability  to  be  close  to 

0  when  fk  is  large. 

Algorithm  1  ( Fuzzy  evolutionary  programming  for  portfolio  selection ) 

Generate  a  population  of  m  vectors  xk  =  (x*  ,...,x* ),  k~  1,...  in,  where  each  x*  is  randomly  taken 
from  the  interval  [0,1],  i=  1,...  ft,  such  that  =1. 

For  k—  1,...  m,  compute  the  objective  value: 

f  =  0 -XXT^,y)p0(i,e)(f  xfr,) 

/=!  7=1  i=l 


4. 


677 


Repeat 


For  each  k  =  1 , . . .  pi,  generate  the  offspring  x  m+k  by  letting,  for  i  =  1 , . . .  V, 
s.  =  x*  if  the  sign  -  is  chosen,  and  1  -  xf  if  the  sign  +  is  chosen; 


Sr*  - 


In 

f  1  1 

if  >i0-' 


s*  otherwise 

x'"+k  =r*  ±8rf,  (+or  -  is  chosen  at  random) 


and  compute  the  corresponding  objective  value  /m+* ,  using  Equation  4. 

For  each  k  =  1,...  ,  2n,  select  a  random  set  U  of  c  indices  from  1  to  2m,  and  record  the  number 
w*  of  heU  such  that  /*</*. 

Select  m  vectors  from  the  set  {xk :k  =  \,...,2m}  that  have  highest  scores  w*  to  form  the  next 
generation  of  population. 


Until  |/°  -  /"'I  <  £  or  the  time  allowed  is  exhausted. 


The  above  algorithm  was  used  to  select  an  optimal  portfolio  of  forty  securities  from  the  Australian  stock 
market.  The  experimental  results  are  discussed  in  the  next  section. 

EXPERIMENTAL  RESULTS 

We  chose  forty  securities  currently  available  on  the  Australian  stock  market,  the  names  of  which  are  listed 
below.  Henceforth,  for  convenience  we  will  refer  to  these  securities  by  their  listed  numbers. 


1.  AGL 

11.  Comalco 

21.  LangCorp 

31.  QBE 

2.  ANZ 

12.  Email 

22.  Leighton 

32.  Rock  Big 

3.  Amcor 

13.  Fairfax 

23.  Linden 

33.  Rothmans 

4.  Amwy  Asia 

14.  GIO  Aust 

24.  Nat  Foods 

34.  Seaworld 

5.  Blackmore 

15.  GPT  unit 

25.  News  Corp 

35.  Seven  Net 

6.  Boral 

16.  Greens 

26.  OPSMPr 

36.  UO  Aust 

7.  Burswood 

17.  Hancock 

27.  Oldfields 

37.  WA  News 

8.  Cadbury 

18.  I  Drug  Tc 

28.  POSN 

38.  Wfield  HI 

9.  Caltex 

19.  Incitec 

29.  Petaluma 

39.  Woolworths 

10.  Coles  Myer 

20.  Just  Jeans 

30.  Prime  TV 

40.  Yates  A 

The  monthly  yields  from  1994  to  1998  of  the  above  listed  securities  were  obtained  from  the  business 
section  of  the  Canberra  Times,  then  the  expected  returns  and  variances  of  individual  securities  and  the 
covariances  of  pairs  of  securities  were  computed  accordingly.  The  securities’  expected  returns  and 
variances  are  shown  in  Table  1 . 


Table  1:  Expected  returns  and  variances  of  the  listed  securities 


n 

r, 

C T2; 

r, 

a2, 

A 

c r, 

1 

4.472000 

0.497816 

11 

1.991000 

0.758129 

21: 

3.086000 

0.773664 

31 

3.903000 

0.601921 

2 

5.089000 

0.383469 

12 

5.771000 

1.676289 

22: 

3.823000 

0.882741 

32 

6.909999 

2.348640 

3 

4.199000 

0.581369 

13 

2.964000 

0.313804 

23: 

1.819000 

0.029649 

33 

7.669000 

14.347868 

4 

2.557000 

2.035761 

14 

5.223000 

2.753821 

24: 

3.942000 

0.774856 

34 

8.295000 

4.453125 

5 

5.898000 

4.174596 

15 

8.168000 

0.341396 

25: 

0.397000 

0.008901 

35 

4.077001 

0.688021 

6 

5.205999 

0.910164 

16 

2.940000 

2.629520 

26: 

5.092000 

0.662936 

36 

7.208000 

5.864076 

7 

6.870000 

3.892480 

17 

5.169000 

0.336229 

27: 

5.109000 

0.620929 

37 

6.461000 

3.793409 

8 

2.908000 

0.810956 

18 

2.623000 

0.548481 

28: 

4.881001 

1.007369 

38 

1.798000 

0.167956 

9 

19 

5.043000 

0.600441 

29: 

2.375000 

0.612125 

39 

4.390000 

2.174800 

10:  4.162000 

0.522016 

20 

4.615000 

1.322225 

30: 

3.733000 

1.030001 

40 

5.101000 

1.827329 

678 


Algorithm  1  was  used  to  select  an  optimal  portfolio  of  the  above  forty  securities.  The  expected  total  return 
was  set  at  5.25,  which  is  1.5%  higher  than  the  government  bond  interest.  For  the  fuzzy  membership  function 
representing  the  constraint  on  expected  total  return,  we  let  p  =  1  and  E  =  0.6.  The  program  results  are 
recorded  in  Figure  1,  where  the  continuous  line  represents  the  fitness  of  the  best  candidate,  and  the  dotted 
line  shows  that  of  the  worst  candidate.  We  observe  that  the  fitness  of  both  candidates  converge  rapidly.  An 
optimal  portfolio  was  found  and  is  shown  in  Table  2. 


Table  2.  An  optimal  portfolio  obtained  by  fuzzy  evolutionary  programming. 


0.000563 

0.000869 

0.000495 

0.000895 

0.012759 


0.023234 

0.002115 

0.046219 

0.000007 

0.067761 


0.007975 

0.027605 

0.099688 

0.002513 

0.000717 


0.000545 

0.000056 

0.102540 

0.043807 

0.006255 


0.002791 

0.000646 

0.067659 

0.000003 

0.069103 


0.004772 

0.002880 

0.004849 

0.001972 

0.000565 


0.069821 

0.105348 

0.000521 

0.048508 

0.072669 


0.000270 

0.062290 

0.001846 

0.000150 

0.036720 


The  optimal  portfolio  shown  in  Table  2  corresponds  to  a  risk  value  of  0.160  with  a  degree  of  confidence 
0.998  of  producing  the  prespecified  expected  total  return.  This  solution  attains  a  fitness  of  0.838. 


Fig.  1:  Convergence  of  fitness  in  portfolio  selection 


Note  that  the  optimal  portfolio  level  of  risk  is  less  than  one  tenth  of  the  average  risk  of  all  securities.  It  is 
interesting  that  the  prespecified  expected  return  at  5.25  can  also  be  attained  (with  a  degree  of  confidence  of 
1)  with  the  price  of  a  slightly  higher  level  of  risk,  i.e.,  0.164.  This  portfolio  is  shown  in  Table  3. 


Table  3.  A  portfolio  attaining  the  expected  total  return  with  slightly  higher  risk 


0.001641  0.021828  0.025770 
0.002860  0.006620  0.021164 
0.001464  0.042652  0.081119 
0.002955  0.000024  0.008586 
0.011672  0.064489  0.002329 


0.001682 

0.000186 

0.085430 

0.044122 

0.021459 


0.009418 

0.002185 

0.061873 

0.000009 

0.063397 


0.014278 

0.009830 

0.015389 

0.007517 

0.001885 


0.065408 

0.104936 

0.001613 

0.038231 

0.068075 


0.000935 

0.049695 

0.006067 

0.000482 

0.030726 


The  portfolio  shown  in  Table  3  is  not  recommended  by  our  program,  however,  as  its  fitness  is  only  0.836, 
which  is  lower  than  that  of  the  optimal  portfolio  shown  in  Table  2.  From  Table  2,  we  observe  also  that  the 
securities  numbered  0,  3,  7,  8,  1 1,  12,  16,  22,  24,  25,  28,  31,  34,  37  have  fairly  low  rates  of  investment.  This 
suggests  that  no  money  should  be  invested  in  those  securities.  A  check  of  the  table  of  variances  confirms 
that  the  listed  securities  have  rather  high  risks.  Therefore,  we  excluded  those  securities  from  the  original  list 
and  reran  Algorithm  1  to  select  an  optimal  portfolio  of  the  remaining  securities.  The  result  shows  a  new 
optimal  portfolio  is  obtained  with  a  level  of  risk  of  0.107,  and  with  the  expected  total  return  folly  achievable 
(with  a  confidence  degree  of  1).  This  suggests  that  we  may  expect  a  higher  total  return  than  that  previously 
specified.  In  fact,  when  we  ran  the  program  again  with  a  new  expected  total  return  of  5.84,  the  optimal 
portfolio  found  achieved  this  expected  return  with  a  confidence  degree  of  0.998  and  with  a  risk  level  of 
0.181,  which  is  still  far  below  the  average  risk  of  all  securities. 


679 


FUZZY  EVOLUTIONARY  PROGRAMMING 
FOR  STOCHASTIC  PORTFOLIO  SELECTION 

In  the  problem  of  portfolio  selection,  the  constraint  on  expected  total  return  may  be  expressed  differently  as 
follows.  The  investor  may  be  interested  in  the  likelihood  that  the  expected  total  return  will  be  above  some 
prespecified  threshold.  That  is,  the  probability  that  the  expected  total  return  is  higher  than  the  prespecified 
threshold  must  be  greater  than  some  predetermined  level.  So,  the  portfolio  selection  problem  is  redefined  as 


V  N  N 

Minimize 

j=1  /=! 

N 

Subject  to  5>/=l. 

1=1 


x .  r;  >  E 


>  1  -a 


x,.  >0,i= 


5. 


Suppose  that  the  expected  returns  r,  have  normal  distributions  with  mean  r,  and  standard  deviation  ar  . 

Then  h  =  Xl=i  x,r,  has  mean  *=Smi  and  variance  •  a1so  let  bc  such  that 

P(X<  Ka)  =  a  .  Then  P{h>E)>  1-a  if  and  only  if 


It  follows  that  constraint  5.  is  equivalent  to 

i= 1 

Thus,  the  stochastic  portfolio  selection  problem  is  reduced  to  a  standard  portfolio  selection  problem,  which 
can  be  solved  by  the  technique  of  fuzzy  evolutionary  programming  as  discussed  in  the  previous  section. 

REFERENCES 

1.  E.J.  Elton  and  M.J.  Gruber,  1995.  Modem  Portfolio  Theory  and  Investment  Analysis.  John  Wiley  & 
Sons,  New  York. 

2.  H.  Konno  and  K.  Suzuki,  1992.  A  fast  algorithm  for  solving  large  scale  mean-variance  model  by 
compact  factorization  of  covariance  matrices.  J.  Operations  Research  Society  of  Japan,  35,  93-104. 

3.  T.V.  Le,  1996.  A  fuzzy  evolutionary  approach  to  constrained  optimization  problems.  Proc.  IEEE  Int. 
Conf.  On  Evolutionary  Computation,  Nagoya,  Japan,  May,  274-278. 

4.  H.M.  Markowitz,  1987.  Mean- Variance  Analysis  in  Portfolio  Choice  and  Capital  Markets,  Blacked. 

5.  A.  Perold,  1984.  Large  scale  portfolio  optimization.  Management  Science,  30,  1 143-1160. 

6.  T.  Watanabe,  K.  Oda,  J.  Watada,  1998.  Strategic  decision  of  investment  by  a  Boltzmann  machine.  Proc. 
of  VJFuzzy98,  HaLong  Bay,  Oct.,  201-208. 


680 


681 


Design  of  a  Region-Wise  Fuzzy  Sliding 
Mode  Controller  with  Fuzzy  Tuner 

Chunq-Chun  Kunq  and  Wei-Chi  Lai 

Department  of  Electrical  Engineering,  Tatung  Institute  of  Technology, 
40  Chungshan  North  Road,  3rd  Sec.,  Taipei,  Taiwan,  R.  O.  C. 

Tel:  (886)-2-25925252  Ext.  3473 
E-mail:  cckung@ctr3.ee.ttit.edu.tw 


ABSTRACT 

In  this  paper,  a  region-wise  fuzzy  sliding  mode  controller  (RFSMC)  is  proposed.  In  the  process  of 
designing  the  RFSMC,  we  firstly  employ  the  sliding  mode  control  techniques  to  design  the  fuzzy 
control  rules,  and  to  obtain  the  fuzzy  sliding  mode  controller  (FSMC).  Secondly,  we  will  adopt  the 
concepts  of  region-wise  linear  fuzzy  controller  to  design  the  FSMC,  namely  the  region-wise  fuzzy 
sliding  mode  controller  (RFSMC).  Then  based  on  the  state  values  of  the  controlled  plant,  a  fuzzy 
tuner  (FT)  is  used  to  tune  the  output  scaling  factors  for  the  RFSMC.  Finally,  a  genetic  algorithm 
(GA)  is  applied  to  search  the  optimal  parameters  of  the  RFSMC.  The  simulation  results  show  that  the 
proposed  RFSMC  has  the  following  advantages:  (1)  the  fuzzy  control  rules  of  the  RFSMC  are 
efficiently  reduced.  (2)  It  exhibits  better  performance  than  that  of  FSMC. 


INTRODUCTION 

The  fuzzy  logical  controller  (FLC)  has  been  successfully  applied  in  the  complex  ill-defined  processes 
with  better  performance  than  that  of  the  conventional  controller  [1-3].  However,  there  are  several 
difficulties  still  exist  in  FLC  design,  such  as:  (1)  fuzzy  control  rules  are  experience  oriented.  (2) 
Characteristics  of  the  fuzzy  control  systems  cannot  be  pre-specified.  (3)  It  is  difficult  to  find  an 
optimal  fuzzy  controller. 

In  [4-8],  FSMC  has  been  presented  to  overcome  the  above  mentioned  difficulties  (1)  and  (2).  The 
FSMC  derives  the  control  signal  to  force  the  states  of  the  controlled  plant  converge  to  the  sliding 
surface  and  then  stay  on  it.  Thus,  the  characteristics  of  the  closed-loop  control  system  can  be  specified 
by  a  pre-defined  sliding  surface. 

Although  the  FSMC  is  a  good  solution  for  designing  the  fuzzy  logic  control  systems,  it  has  two  input 
variables  and  it  will  lead  the  number  of  rules  in  a  complete  fuzzy  rule  base  equals  to  m  2  (where  m  is 
the  number  of  input  linguistic  labels).  Thus  the  number  of  fuzzy  rules  in  the  FSMC  will  go  up 
exponentially  as  m  increase.  In  this  paper,  we  will  adopt  the  concepts  of  region-wise  linear  fuzzy 
controller  [9]  to  design  the  FSMC,  namely  the  region-wise  fuzzy  sliding  mode  controller  (RFSMC). 
We  combine  the  two  input  variables  of  FSMC  as  a  new  one  input  variable  for  the  RFSMC.  Thus,  the 
number  of  fuzzy  control  rules  in  the  RFSMC  will  be  equal  to  m'  (where  m  is  the  number  of  linguistic 
labels  of  the  new  one  input  variable).  Then,  a  fuzzy  tuner  (FT)  is  utilized  to  tune  the  scaling  factors  of 
the  RFSMC.  Finally,  a  GA  is  applied  to  search  the  optimal  parameters  of  RFSMC  and  FT.  The 
simulation  results  show  the  RFSMC  with  FT  has  better  performance  than  that  of  FSMC. 


ELITIST  GENETIC  ALGORITHM 

GAS  are  global  search  algorithms  based  on  the  mechanics  of  natural  genetics  and  natural  selection 
[11,12],  Unlike  many  classical  optimization  techniques,  GAS  do  not  rely  on  computing  local 
derivatives  to  guide  the  search  process.  GAS  also  include  random  elements,  which  help  to  avoid 
being  trapped  in  local  optimum.  So  GAS  can  provide  a  mean  to  search  poorly  understood  and 
irregular  space.  They  have  been  used  mainly  as  function  optimizers  and  have  been  proved  to  be 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


682 


effective  global  optimization  tools,  especially  for  multimodel  and  non-continuous  functions.  In  this 
paper,  we  will  use  the  elitist  GA  to  find  the  optimal  values  of  the  fuzzy  controller.  The  elite  selection 
procedure  will  ensure  the  survival  of  the  fittest  individual  in  each  generation. 


THE  REGION-WISE  FSMC  WITH  FUZZY  TUNER 

SLIDING  MODE  CONTROLLER 

Consider  a  class  of  MIMO  nonlinear  system  of  the  form: 


x  "1  ’ (0  =  fj C X , Xp;t)+ bj (X, Xp ; /) u(t)  +  dj  ( t ),  j  =  1,2,..., p 


where 


X i  ] r ,  1  =  1,2,...,/), 

bJ  =[bjl’bj2>"-,bjq],  M  = 

and  p  and  q  are  the  numbers  of  input  and  output  variables.  It  is  assumed  that  all  the  f. ,  bj  and  d 


are  unknown  but  bounded  functions. 

CONTROL  OBJECTIVE:  For  a  system  given  by  Eq.  (1),  design  a  controller  so  that  the  state  Xj(t) 
of  the  system  will  track  the  desired  trajectory  Xd(t) ,  for  all  j  =  1 

Let  Xdj^[xdJ(t),xdj(t),---,x^r'\t)]T  and  |  \  <  Vj(t)  . 

Then  the  tracking  error  vector  can  be  written  as  follows: 

Let  us  define  a  set  of  sliding  surface  Hj(t)  in  the  state  space  R"J  as 

Hy(f):  j  X j  Sj{xj,t]^  +  X)^ej=  o|,  j  = 

It  is  known  that  if  there  exists  apositive  constant  r|; ,  such  that  [13-14] 

2. 

then  the  states  trajectories  will  reach  the  sliding  surface  H^.  (?)  and  then  remain  on  the  surface 
5y(XJ,/)  =  0,  for  all  t  >  0.  In  the  sliding  region,  the  system  error  will  converge  to  zero  asymptotically. 


THE  FUZZY  SLIDING  MODE  CONTROLLER  DESIGN 

The  FSMC  for  the  system  of  Eq.  1  is  shown  in  Fig.  1  and  its  linguistic  rule  base  can  be  summed  up  as 
in  Table  1,  where 

S{KT)=  s{KT)*  GS , 

S(KT)  =  s(KT)*GCS , 

A U{KT  +  r)  =  FSMC  [  S(KT) ,  ^(AT)  ] , 

Au(KT  +  T)  =  A  U{KT +T)*GU, 

and  the  associated  fuzzy  subsets  involved  in  the  FSMC  are  as  follows:  NL  =  Negative  Large,  NM  = 
Negative  Medium,  NS  =  Negative  Small,  Z  =  Zero,  PS  =  Positive  Small,  PM  =  Positive  Medium,  PL  = 
Positive  Large  and  are  shown  in  Fig.  2. 


683 


Fig.  1.  The  block  diagram  of  the  FSMC. 
Table  1:  The  linguistic  rule  base  of  FSMC 


i 5 

NL 

NS 

Z 

PS 

PL 

NL 

PL 

PL 

PM 

PS 

Z 

NS 

PL 

PM 

PS 

z 

NS 

Z 

PM 

PS 

z 

NS 

NM 

PS 

PS 

z 

NS 

NM 

NL 

PL 

z 

NS 

NM 

NL 

NL 

NL  NS  Z  PS  PL  NL  NM  NS  Z  PS  PM  PL 


Fig.  2.  Membership  functions  of  the  fuzzy  variable  S ,  S  and  A  U  . 


REGION-WISE  FUZZY  SLIDING  MODE  CONTROLLER 

Since  the  FSMC  has  two  input  variables,  S  and  S ,  the  number  of  fuzzy  control  rules  in  a  complete 
rule  base  of  the  FSMC  equals  to  m 2  ( m  is  the  number  of  fuzzy  sets  for  S  and  S).  Thus,  the 
complexity  of  the  FSMC  will  go  up  exponentially  as  m  increases.  To  reduce  the  number  of  fuzzy 
control  rules  in  the  FSMC,  we  will  adopt  the  concepts  of  region-wise  linear  fuzzy  controller  [9]  to 
design  the  FSMC,  namely  the  region-wise  fuzzy  sliding  mode  controller  (RFSMC). 

To  design  the  RFSMC,  we  define 

S'=2(S  +  S)/3  3. 

as  the  input  to  the  RFSMC,  whose  structure  is  shown  in  Fig.  3.  Let  the  input  and  output  fuzzy 
variables  of  RFSMC  have  seven  linguistic  labels  which  are  denoted  by  PL,  PM,  PS,  Z,  NS,  NM,  and 
NL  as  shown  in  Fig.  4.  The  relationship  between  S * ,  S  and  S  is  listed  in  Table  2.  Based  on  Table  2, 
the  rule  base  shown  in  Table  1  for  the  FSMC  will  be  equivalent  to  the  rule  base  shown  in  Table  3  for 
the  RFSMC.  Since  the  RFSMC  has  only  seven  fuzzy  if-then  rules  in  its  rule  base,  it  is  much  simpler 
than  FSMC.  The  change  of  the  control  signal  for  the  RFSMC  can  be  calculated  by 

Au(kT  +  T)  =  RFSMC  [S' (kT  +  T)\*GU  4. 


*0- 

GCS 


S 

A  U 

— ► 


)s  , 

Linguistic 

)  ► 

Rule  Base 

Fig.  3.  The  block  diagram  of  RFSMC. 


684 


NL  NM  NS  Z  PS  PM  PL  NL  NM  NS  Z  PS  PM  PL 


Fig.  4.  Membership  functions  of  the  fuzzy  variable  S  and  A  U  for  RFSMC. 


Table  2:  The  relationship  between  S * ,  S  and  S  Table  3:  The  rule  base  for  the  RFSMC. 


5* 

NL 

NM 

NS 

z 

PS 

PM 

PL 

A  U 

PL 

PM 

PS 

z 

NS 

NM 

NL 

i  s 

NL 

NS 

z 

PS 

PL 

NL 

NL 

NL 

NM 

NS 

z 

NS 

NL 

NM 

NS 

z 

PS 

Z 

NM 

NS 

z 

PS 

PM 

PS 

NS 

Z 

PS 

PM 

PL 

PL 

Z 

PS 

PM 

PL 

PL 

RFSMC  WITH  FUZZY  TUNER 

In  designing  the  fuzzy  control  system,  the  choice  of  suitable  scaling  factor  for  the  output  of  the  fuzzy 
controller  is  an  important  task  [10].  A  large  scaling  factor  is  needed  for  the  output  to  enlarge  the 
change  of  control  signal  so  that  the  control  signal  can  fast  converge  to  the  desired  value.  While  a  small 
scaling  factor  is  needed  for  the  control  signal  to  shrink  the  change  of  control  signal  so  that  the  control 
signal  can  smooth  converge  to  its  desired  value.  In  this  section,  we  design  a  fuzzy  tuner  (FT)  to  offer  a 
suitable  scaling  factor  for  the  output  of  the  RFSMC.  The  structure  of  the  FT  is  shown  in  Fig.  5. 


Linguistic 

GU 

Rule  Base 

W 

Fig.  5.  The  block  diagram  of  the  FT 
From  Eq.  2,  we  must  design  the  control  signal  so  that 

ss  <  -T)s2 

A  sufficient  condition  for  Eq.  (5)  to  be  satisfied  is  that  s  +T|,s-  =  0 ,  or  equivalently, 


5. 


s+y]s  ~s/gs+ 


s%s- 


0 


By  multiplying  GS  on  both  sides  of  Eq.  (6)  and  letting  r)  = 


Eq.  6  becomes: 


GS 

GCS 


S  +  S  =  0 


Let  G*|^'  +  5'|  be  the  input  variable  of  FT  (which  is  bound  in  [0,1])  and  GU  the  output  of  FT.  In 

order  to  satisfy  Eq.  (7),  we  can  design  the  fuzzy  rule  of  the  FT  as  shown  in  Table  4,  and  membership 
functions  as  shown  in  Fig.  6. 


Table  4:  The  fuzzy  rule  base  of  the  FT 


G* 

s+s 

s 

M 

L 

GU 

cj)2 

U) 

685 


ii 

s 

N 

4  I 

-V 

GU 

0  <t>,  <{)2 

k  * 

Fig.  6.  Membership  functions  of  the  fuzzy  variable  G  * 


s+s\ 


and  GU  . 


To  obtain  the  optimal  RFSMC  and  FT,  we  will  apply  the  elitist  GA  to  search  the  optimal  values  of  c , 
<j)j  ,<(),,  <t>3 ,  G  ,  GS  and  GCS .  The  change  of  control  signal  derived  by  the  RFSMC  with  a  FT  is 
then  given  by 

A  u(kT  +  T)=  RFSMC  [  S(kT  +  T)’  \*GU(kT  +  T) 

where 

GU(kT  +  T)  =  FT[G*  |  S{kT  +  T)+S(kT  +  T) |  J 


SIMULATION  RESULTS 

To  compare  performances  of  the  proposed  RFSMC  with  FSMC,  an  inverted  pendulum  was  simulated. 
Consider  an  inverted  pendulum  system  described  by  the  following  differential  equations  [15]: 

0,  =02 

„  ,  ml  „  1 

gsinO, 


■cos0,( 


02  =■ 


M  +  m 


-0,2  sin0,  +- 


M  +  m 


u) 


4  l  mL 
3  M  +  m 


cos20, 


We  set  the  parameters  in  this  system  as  M=  1  kg,  m  =  0.1  kg,  L  =  0.5  m,  g  =  9.8  m/s2 ,  and  sampling 
time  =  0.005  sec.  The  control  objective  is  to  design  a  fuzzy  controller  so  that  the  output  angle  0  can 
track  the  desired  trajectory  0rf . 


In  order  to  meet  the  control  objective,  we  define  the  sliding  surface  as  follows: 

H:{x\s(X;t)  =  e+20e  =  0  },  where  e  =  0  -Qd 
The  following  two  cases  are  controlled  to  compare  the  performances  of  the  RFSMC  with  FSMC. 

CASE  1:  0(0)  =  1.0  radius,  0(0)  =  0  radius/sec,  and  Qd(t)  =  0  radius. 

CASE  2:  0  (0)  =  1 .0  radius,  0  (0)  =  0  radius/sec,  and  Qd(t)  =  sin(?)  radius. 

To  compare  the  performance  of  RFSMC  with  FSMC,  we  define  the  cost  function  as  follows: 

800 

J  =  '2dkT*\  s(kT )  |  +  0.05 1  u(kT)  | 

k=\ 

We  adopt  the  resulting  data  of  case  1  as  training  data  for  the  GA  to  search  the  optimal  RFSMC  and 
FSMC.  The  fitness  function  for  the  GA  is  defined  by: 

F  =  2,00()/ [X(|*F*  s(kT )  |  +  0.05|  u{kT)  \ ) 

The  resulting  optimal  controllers  are  then  respectively  applied  to  control  the  system  of  case  2.  The 
simulation  results  are  shown  in  Table  5,  where  ts  is  defined  as  the  settling  time  of  S  . 


686 


Table  5:  The  simulation  results 


FSMS  with  FT 

RFSMC  with  FT 

J 

J 

ts 

Case  1 

167.54 

0.505 

150.11 

0.48 

Case  2 

1272.5 

0.52 

1115.6 

0.475 

CONCLUSIONS 

In  this  paper,  we  have  proposed  a  RFSMC  with  a  FT  and  then  uses  a  GA  to  search  for  optimal 
parameters.  By  combining  the  two  input  variables  of  FSMC  as  a  new  one  input  variable,  the  fuzzy 
control  rules  of  RFSMC  only  increase  linearly  rather  than  exponentially  when  the  input  fuzzy  labels 
increase.  Hence  the  complexity  of  the  fuzzy  rule  base  is  reduced.  To  improve  the  performance  of 
RFSMC  further,  we  combine  the  RFSMC  with  a  FT,  which  uses  three  rules  to  tune  the  scaling  factor 
for  the  output  of  the  RFSMC.  Finally,  the  elitist  GA  is  applied  to  search  for  optimal  parameters  for 
RFSMC  and  FT.  The  simulation  results  show  that  the  RFSMC  with  FT  exhibits  better  performance 
than  that  of  a  conventional  FSMC. 


ACKNOWLEDGEMENT 

The  authors  would  like  to  thank  National  Science  Council  for  providing  financial  support  for  this 
research  under  Grant  NSC88-2213-E-036-020 


REFERENCES 

1.  C.C.  Fuh,  P.C.  Tung,  1997.  "Robust  stability  analysis  of  fuzzy  control  systems",  Fuzzy  Sets  and 
Systems,  88,  289-298. 

2.  G  Feng,  S.G  Cao,  N.W.  Rees,  C.K.  Chak,  1997.  "Design  of  fuzzy  control  systems  with 
guaranteed  stability",  Fuzzy  Sets  and  Systems,  85,  1-10. 

3.  J.A.  Johnson,  H.B.  Smartt,  1995.  "Advantages  of  an  alternative  form  fuzzy  logic",  IEEE  Trans. 
Fuzzy  Systems,  3(2),  149-157. 

4.  C.L.  Chen,  M.H.  Chang,  1998.  "Optimal  design  of  fuzzy  sliding-mode  control:  A  comparative 
study",  Fuzzy  Sets  and  Systems,  93,  37-48. 

5.  H.X.  Li,  H.B.  Gayland,  A.W.  Green,  1997.  "Fuzzy  Variable  Structure  Control",  IEEE  SMC,  27(2), 
306-312. 

6.  C.C.  Rung,  C.C.  Liao,  1994.  "Fuzzy-sliding  mode  control  of  nonlinear  system",  R.  O.  C. 
Automatic  Control  Conference,  259-264. 

7.  C.C.  Rung,  S.C.  Lin,  1992.  "A  fuzzy-sliding  mode  controller  design",  IEEE  International 
Conference  on  System  Engineering,  1904-1905. 

8.  GC.  Hwang,  S.C.  Lin,  1992.  "A  stability  approach  to  fuzzy  control  design  for  nonlinear  systems", 
Fuzzy  Sets  and  Systems,  48,  279-287. 

9.  J.S.  Taur,  C.W.  Tao,  1997.  "Design  and  Analysis  of  Region-Wise  Linear  Fuzzy  Controllers",  IEEE 
SMC  Trans.,  27(3). 

10.  H.Y.  Chung,  B.C.  Chen,  J.J.  Lin,  1998.  "A  Pi-type  fuzzy  controller  with  self-tuning  scaling 
factors",  Fuzzy  Sets  and  Systems,  93,  23-28. 

11.  J.H.  Holland,  1975.  Adaptation  in  Natural  and  Artificial  Systems.  Ann  Arbor,  Univ.  Mich.  Press. 

12.  D.E.  Goldberg,  1989.  Genetic  Algorithms  in  Search,  Optimization  and  Machine  Learning. 
Reading,  MA,  Addison- Wesley. 

13.  J.J.  Slotine,  1984.  "Sliding  controller  design  for  non-linear  systems",  Int.  J.  Contr.,  40(2),  421-434. 

14.  J.J.  Slotine,  S.S.  Sastry,  1983.  "Tracking  control  of  non-linear  systems  using  sliding  surfaces  with 
application  to  robot  manipulators",  Int.  J.  Control,  38(2),  465-492. 

15.  M.  Fliess,  1989.  "Nonlinear  control  theory  and  differential  algebra",  in  Modeling  and  Adaptive 
Control,  Ch.  I.,  Byrnes  and  A.  Rhurzhansky,  Eds.,  New  York,  Springer- Verlag. 


687 


A  MULTI-INPUT  CURRENT-MODE  FUZZY  INTEGRATED  CIRCUIT 

FOR  PATTERN  RECOGNITION 

Gu  Lin,  Bingxue  Shi 

Institute  of  Microelectronics,  Tsinghua  University,  Beijing,  100084,  China 


ABSTRACT 

A  multi-input  current-mode  fuzzy  recognition  integrated  circuit  is  proposed  in  this  paper,  which  is 
amenable  to  many  pattern  recognition  applications.  It  can  accept  multiple  inputs  that  represent  multiple 
features  of  an  unknown  pattern  in  time-shared  way.  The  principle  of  the  fuzzy  circuit  is  based  on  “Sum  - 
Sorting”  rule.  In  the  fuzzy  circuit,  membership  function  generators  employs  current-mode  circuit  to 
generate  memberships  corresponding  to  each  standard  pattern  according  to  the  input  features.  Switched- 
current  accumulators  are  adopted  to  realize  the  Sum  function  to  get  synthetic  memberships.  Sorting  circuit 
sorts  all  of  synthetic  memberships  based  on  their  magnitudes  ,  and  finally  recognition  results  are  outputted. 
The  fuzzy  integrated  circuit  has  been  successfully  manufactured  in  2um  N-well  standard  digital  CMOS 
process.  It  has  been  applied  to  speaker-independent  Chinese  digits  speech  recognition  with  the  high 
recognition  speed  of  1.7xl05  digits  per  second  and  the  high  recognition  rate(  the  first  recognition  rate  is 
more  than  90%  ,  the  second  recognition  rate  is  more  than  98% ). 

INTRODUCTION 

Since  fuzzy  mathematics  was  established,  it  has  found  applications  in  expert  system,  pattern  recognition, 
robotics,  and  industry  control,  etc.  There  are  two  ways  to  implement  fuzzy  processes.  One  is  completed  by 
software  using  digital  computer,  but  it  is  difficult  to  work  by  this  way  in  real-time.  The  other  is  completed 
directly  using  hardware  system,  it  could  implement  high-speed  process.Fuzzy  hardware  can  be 
implemented  in  analog  or  digital  circuits.  Digital  fuzzy  system  is  a  special  computer  system,  it  takes 
advantage  of  mature  digital  VLSI  technology,  but  its  scale  is  large  [1].  Analog  fuzzy  system  is  composed  of 
multiple-valued  logic  (MVL)  circuit  elements.  MVL  circuits  have  two  kinds  of  mode,  current-mode  and 
voltage-mode.  Comparing  with  the  voltage-mode  circuit,  current-mode  circuit  is  easy  to  realize  sum  and 
difference  operation,  has  large  current  range  and  high  integration  density  ,  especially,  current-mode  circuit 
is  able  to  be  implemented  in  a  standard  CMOS  technology.  So  current-mode  circuits  are  employed  in  many 
fuzzy  systems[2,3,4]. 

In  this  paper,  a  fuzzy  integrated  circuit  for  Chinese-digit  speech  recognition  is  proposed.  Its  structure  is 
amenable  to  other  many  pattern  recognition  applications.  Experimental  results  show  that  the  circuit  can 
correctly  realize  the  recognition  function  and  has  the  advantages  of  high  recognition  speed  and  high 
recognition  rate . 

PRINCIPLE  OF  FUZZY  PATTERN  RECOGNITION 

The  feature  of  objective  things  often  has  some  ambiguity.  It  is  demonstrated  with  fuzzy  set  characterized  by 
a  membership  function.  The  fundamental  of  fuzzy  pattern  recognition  is  the  principle  of  maximum 
membership.  In  fact,  a  standard  pattern  has  often  multiple  fuzzy  features.  Let  each  of  N  standard  patterns 
have  M  fuzzy  features.  A..,  i  =  1,2,...,  V;  j  =  l,2,...,M,  represents  the  Jth  fuzzy  feature  of  the  rth  standard 

pattern.  Then,  each  standard  pattern  becomes  a  fuzzy  vector  (or  multifactorial  fuzzy  set) 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


688 


Ai  =  %  =  (An ,  Ai2,...,AiM),  1<  i  <  N.  Assuming  that  Uq  =  „2° , . . . ,  U°M  )  is  the  unknown  pattern  to  be 

recognized,  its  each  element  corresponds  to  a  fuzzy  feature.  If  there  exits  k  e  {1,2, ... ,  N]  to  make 

li2k  («q)  =  max{p^  («b),M-^  («o)}  L 

then,  it  is  decided  that  u0  relatively  belongs  to  A  £  ,  in  which,  assuming  that 

M"°)  =  (“’  )’^4-2(M2  ))  2- 

Mm{») is  defined  as  a  multifactorial  synthetic  function. 

Equation  (1)  is  the  classical  rule  for  multifactorial  fuzzy  pattern  recognition.  At  present,  almost  of  fuzzy 
recognition  integrated  circuits  are  implemented  based  on  the  rule.According  to  this  rule,  these  fuzzy 
processors  only  find  the  standard  pattern  closest  to  unknown  pattern  as  the  recognition  result,  that  is,  the 
standard  pattern  having  the  largest  synthetic  membership  is  chosen  as  the  result.  However,  with  the 
increase  of  the  system  complexity  and  the  number  N  of  standard  patterns,  especially  the  improvement  of 
the  multi-stage  cascade  system  ,  the  above  method  can  not  further  meet  the  requirement  of  system.  To 

develop  the  performances  of  system,  it  is  necessary  that  the  system  is  able  to  find  two  or  more  standard 

patterns  closer  to  unknown  pattern  based  on  the  magnitudes  of  synthetic  memberships.  To  this  end,  the 
proposed  fuzzy  recognition  integrated  circuit  employs  sorting  operation  based  on  magnitude  in  the 
judgment  part  of  fuzzy  recognition  process.  This  fuzzy  processor  can  find  h  (1<  h  <  N)  standard  patterns 
closer  to  unknown  pattern.  That  is,  if  there  exits  k, ,  k2 ,...,  kh  e  { 1 ,2,..,N}  to  make 

N 

^41(Mo)  =  max^,.(Mo)  3- 

1  i=i 

N 

^4  (Mo)  =  max^T,(«o)  4- 

2  /=!,/**, 


N 

v-2.  (Mo)=  max  Mmo)  5- 

h  /=1,/**,  ,*2 

then,  it  is  decided  that  h  (1<  h  <  N)  standard  patterns  closer  to  unknown  pattern  u0  will  be  A k  ,  Ak^  , 
...,  Akh  respectively. 

Generally,  the  synthetic  function  is  defined  in  two  kinds  of  way:  minimum  function  and  sum  function. 
Comparing  with  the  minimum  function  ,  the  sum  function  has  better  generalization  ability  which  means 
better  recognition  performance  can  be  achieved  for  untrained  data.  So  the  sum  function  is  employed  in  the 
proposed  fuzzy  processor. 


DESIGN  OF  FUZZY  INTEGRATED  CIRCUIT 

The  structure  diagram  of  the  fuzzy  integrated  circuit  is  shown  in  Fig.  1.  The  fuzzy  integrated  circuit 
consists  of  membership  function  generators  (MFG),  switched-current  accumulators(SIA),a  sorting  circuit , 
feature  decoder  and  clock  circuit.  Assuming  that  feature  code  is  K  bit,  every  feature  could  have  2K  value.  In 
order  to  decrease  pad  number  on  the  chip  and  share  a  feature  decoder,  M  features  are  inputted  to  the 
processor  in  a  time-shared  way. 


689 


Recogntion  Results 


Rg.l  The  strcture  of  Fuzzy  Processor 


When  the  processor  is  started,  feature  code  is  firstly  inputted  and  decoded  by  the  feature  decoder.  The 
outputs  of  the  feature  decoder  enter  membership  function  generators,  and  memberships  of  the  feature 
belonging  to  N  standard  patterns  are  gotten.  Then  memberships  are  accumulated  by  accumulators  to  get 
synthetic  memberships  in  time-shared  way.  Finally,  the  sorting  circuit  sorts  synthetic  memberships  based 
on  their  magnitudes  and  outputs  recognition  results. 


MFGjj 


—  GND 


Fig. 2  The  circuit  diagram  of  MFG 

Membership  Function  Generator  (MFG) 

MFG  is  shown  in  Fig.2.  MFGjj  (1<  i  <  N  ,1  <  j  <  M  )is  the  membership  function  generator  of  the  /th  fuzzy 
feature  of  the  /th  standard  pattern.  Ij{\<  i  <  N  ,1  <j  <  M)  represents  the  membership  value  of  the  yth  fuzzy 
feature  of  the  rth  standard  pattern.  Iy  is  composed  of  lyi  (0  <  /  <  D,  D=2K-1)  that  represents  the  /th 
membership  value  of  the  y'th  fuzzy  feature  of  the  rth  standard  pattern.  The  current-mode  MFG  employs  a 
scaled  current  mirror.  It  is  mainly  composed  of  NMOS  mirror  transistors  M2  and  M3.  The  scaled  output 
current  according  to  shape  ratio  of  M3  to  M4  corresponds  to  one  membership  value.  Analog  switched 
transistor  Ml  is  controlled  by  output  F,  (0  <  /  <  D)  of  the  feature  decoder  and  timing  signals  Cj  (1  <  j  <  M), 
in  which  Cj  (1  <j  <  M)are  some  neighboring  and  non-overlapping  pulses,  each  of  which  matches  one  input 
feature.  In  Fig.2,  Cj  (1  <j<  M)  is  the  inverted  signal  ofCy.  Ft  (0  <  /  <  D)  is  the  inverted  signal  of  F, . 


690 


Switched-Current  Accumulator  (SIA) 

Accumulator  adopted  to  perform  accumulation  of  memberships  is  implemented  by  using  second-generation 
RGC(Regulated-Gate  Cascode)  switched-current  integrator.  Switched-current  circuits  are  able  to  be 
directly  fabricated  in  standard  digital  CMOS  technology,  and  is  easily  to  be  integrated  in  mixed  analog- 
digital  system  and  implemented  in  VLSI.  At  present,  switched-current  circuits  have  been  employed  in 
many  fields[5,6,7].  Fig.3  shows  the  diagram  of  the  switched-current  accumulator.  In  Fig.3,  RGC1,RGC2 
and  RGC3  are  fully  identical  RGC  unit.  SW0,SW1  and  SW3  are  analog  switches  which  are  controlled  by 
nonoverlapping  clock  signals  CL]  and  CL2  respectively.  I0UI  is  the  output  current  of  the  integrator  in  the 
accumulator.  Iout  ~  Iin  in  z  transformation  is 

lout  =  (linXz-1 )/(  1  -z'1)  6. 

From  above  equation,  the  circuit  shown  in  Fig.3  is  able  to  perform  the  function  of  integrator,  that  is,  it  can 
implement  the  function  of  accumulation. 


VDD 


Fig.3  The  circuit  diagram  of  SIA 

Sorting  Circuit 

The  current-mode  sorting  circuit  is  employed  to  sort  synthetic  memberships  based  on  their  magnitudes  and 
find  out  standard  patterns  closer  to  unknown  pattern  [8].  For  simplicity,  a  current-mode  sorting  circuit  based 
on  magnitude  is  discussed  by  taking  three  input  currents  for  example.  The  structure  and  timing  diagram  of 
the  sorting  circuit  is  shown  in  Fig.4.  Fig.5  shows  the  circuit  diagram  of  TRANS  unit  in  the  sorting  circuit. 


Flere,  the  operation  principle  of  the  sorting  circuit  is  discussed.  Referring  to  the  circuit  and  timing  diagram 
shown  in  Fig.4,  firstly,  the  signal  Reset  increases  from  low  level  to  high  level ,  this  causes  the  level  at  the 
node  CTk(k=0,l,2)  in  block3  to  go  to  high  ,  and  the  level  at  the  node  Voutk(k=0,l,2)  to  go  to  low  .  Because 
the  level  at  the  node  CTk(k=0,l,2)  is  high,  lk(k=0, 1 ,2)=link(k=0, 1 ,2)  in  blockl.  In  the  sorting  circuit  shown 
in  Fig.4,  BIock2  mainly  consists  of  a  current-mode  winner-take-all(WTA)  circuit  network  which  is  a 
laterally  inhibitory  interconnected  network  with  high  resolution  and  high  speed.  In  the  WTA  circuit,  all  of 
NMOS  transistors  corresponding  to  M21~M23  have  same  W/l  ratio.  For  the  convenience  of  description, 
herein  lino  is  assumed  as  the  maximum  input  current, that  is,lino=max(Iin0,  Iini,  Iin2)  and  I0=max(I0,  fi,  I2). 
When  the  WTA  circuit  works,  it  finds  the  maximum  input  current  by  lateral  inhibitory  effect.  This  causes 
the  level  at  node  VSouto  to  be  high  and  the  levels  at  node  VSoufi  and  VSout2  to  be  low  .  At  the  instant  Tj , 
the  clock  signal  CK  goes  to  high.  It  is  clear  that  the  Vouto  becomes  high  since  the  level  at  the  node  VSouto 
is  high,  while  Vout,  and  Vout2  become  low  since  the  levels  at  the  node  VSouti  and  VSout2  are  low.  When 


691 


the  clock  signal  Ck  goes  to  low  at  the  instant  T2,  the  level  at  the  node  CT0  becomes  low  ,  and  Vout0 
becomes  low,  so  that  a  high  voltage  pulse  is  generated  on  the  Vouto  terminal.  While  CTj  and  CT2  are  still 
high,Vout,  and  Vout2  are  still  low.  Among  Vout0  ,Vout2  and  Vout2  ,  only  Vout0  outputs  one  high  level 
pulse,  this  shows  the  terminal  of  input  current  lino  has  the  maximum  input  current  value. 


Block2: 


Block3: 


L4'-[  Mpi  Hij  MP2  :  !  — 1L_  I  1  H;  ; 

- •  Mn|  ^  t.  M^2  ;  ' - !:  HT  I  |  ■ - /  LjJ  j: 

j . .CTq . . .  VSquto Cl).. .YSoul ii 4.JCT2.-- XSeMji ! 


i  TRANS 


TRANS 


TRANS 


Reset  • 


IVvOv  L  “ 

ck _ JULTL 

t,T2 

Fig.4  The  circuit  and  timing  diagram  of  the  sorting  circuit 


CT 

swj - 

!NCT 


I  jNCK 
CT  CK  - ° 


CK_  l 

1— !SW — r 


nck14k0.:^ct 

'NCK  -  i>Vout 


Fig.5  The  circuit  diagram  of  TRANS  unit 

Since  the  level  at  the  node  CT0  is  low,  the  voltage  at  the  node  VSouto  can  not  be  inputted  to  the  block3  so 
that  the  level  at  the  node  CT0  and  Vouto  will  hold  zero  until  the  next  reset  signal  is  activated.  On  the  other 
hand,  the  levels  at  the  node  CT0  being  low  makes  the  current  I0  become  zero  in  blockl,  so  that  I0  will  not 
influence  the  sequential  comparison  operation.  In  this  manner,  Iinj  and  Iin2  are  compared  and  the  maximum 
one  between  them  will  be  determined  by  the  process  described  above.  A  high  voltage  pulse  is  generated  on 
the  corresponding  Vout  terminal.  The  remaining  operations  can  be  deduced  accordingly.  Finally,  the  sorted 


692 


results  can  be  obtained.  In  fuzzy  integrated  circuit  shown  in  Fig.4,  the  sorted  results  are  outputted  as 
recognition  results. 

In  the  sorting  circuit,  the  pulse  number  h  of  signal  CK  can  be  chosen  to  meet  different  requirements.  For 
example,  when  h  is  equal  to  1,  the  sorting  circuit  only  executes  the  MAX  function;  when  h  is  equal  to  N 
which  is  the  number  of  input  currents,  the  sorting  circuit  sorts  all  of  input  currents. 

From  above  discussion,  it  is  clear  that  the  fuzzy  integrated  circuit  is  able  to  find  h  (1<  h  <  N)  standard 
patterns  closer  to  unknown  pattern  based  on  magnitudes  of  synthetic  memberships.  This  will  greatly 
improve  the  performances  of  the  fuzzy  system. 


Table  1.  Measured  results  for  the  fuzzy  processor 


chip  area 

5.4x5. 8mm2 

chip  core  area 

3. 6x4. 1mm2 

pin  number 

40 

transistors  number 

about  4100 

pattern  number  N 

11 

frequency 

2MHz 

recognition  rate 

1 . 7  x  1 0^  patterns  /  second 

power  supply 

5V 

lowest  power  supply 

3.5V 

power  consumption  in  the 
state  of  recognition 

about  80m  W 

power  consumption  in  the 
state  of  wait 

about  20m  W 

Fig. 6  The  microphotograph  of  the  fuzzy  processor  chip 


EXPERIMENTAL  RESULTS 

The  fuzzy  integrated  circuit  has  been  manufactured  in  2pm  N-well  standard  digital  CMOS  technology.  The 
microphotograph  of  the  fuzzy  processor  chip  is  shown  in  Fig.6.  The  fuzzy  processor  has  been  successfully 
measured  .  The  measured  results  are  shown  in  Table.  1. 


693 


This  fuzzy  processor  has  been  successfully  applied  to  speaker-independent  Chinese  digits  speech 
recognition  including  1 1  digits.  Recognition  experimental  results  show  that  the  fuzzy  processor  has  high 
recognition  speed  of  1.7x1 05  digits  per  second  ,  and  high  recognition  rate  ( the  first  recognition  rate  is  more 
than  90%,  the  second  recognition  rate  is  more  than  98%  ). 

SUMMARY 

A  multi-inputs  current-mode  fuzzy  integrated  circuit  is  proposed  in  this  paper, which  can  accept  multiple 
inputs  that  represent  multiple  features  of  an  unknown  pattern  in  time-shared  way.  In  this  fuzzy  system,  the 
Sum-Sorting  operation  is  used.  The  structure  of  the  fuzzy  processor  is  amenable  to  many  pattern 
recognition  applications.  The  fuzzy  processor  has  been  successfully  manufactured  in  2pm  N-well  standard 
digital  CMOS  process.  It  has  been  applied  to  speaker-independent  Chinese  digits  speech  recognition  with 
the  recognition  speed  of  1.7x1 05  digits  per  second  and  the  high  recognition  rate(  the  first  recognition  rate  is 
more  than  90%,  the  second  recognition  rate  is  more  than  98%). 


ACKNOWLEDGMENT 

This  project(69636030)  is  supported  by  National  Natural  Science  Foundation  of  China 


REFERENCE 

1.  Togai,  H.,  Watanabe,  1986,  A  VLSI  implementation  of  a  fuzzy  inference  engine:  toward  an  expert 
system  on  a  chip.  Information  Sciences,  .38(2),  147-164. 

2.  Yamakawa,  1985.  The  design  and  fabrication  of  the  current  mode  fuzzy  logic  semi-custom  IC  in  the 
standard  CMOS  IC  Technology.  Proc.  15th  ISMVL,  76-82. 

3.  W.  Current,  1994.  Current-mode  CMOS  multiple-valued  logic  circuits.  IEEE  J.  of  Solid-State  Circuits, 
29(2),  95-107. 

4.  Gu  Lin,  Bingxue  Shi,  1997.  Novel  switched-current  fuzzy  processor  for  pattern  recognition  .  .J. 
Tsinghua  University  (Science  and  Technology),  37(9),  86-89. 

5.  Terri,  S  ,  Guojin,  L.,  David,  J.,  1991.  Switched-current  circuit  design  issues  .  IEEE  Solid-State 
Circuit,  26(3),  192  -201. 

6.  Gu  Lin,  Bingxue  Shi,  1998.  A  novel  high  resolution  switched-current  sorter  based  on  magnitude, 
Chinese  Journal  of  Semiconductors, 19(2), 144-150. 

7.  Rajaesh  H  ,  David,  J  ,  Terri,  S.,  1993.  Fully  balanced  CMOS  current-mode  circuits  .  IEEE  Solid-State 
Circuit,  28(5)  ,569  -575. 

8.  Gu  Lin,  Bingxue  Shi,  1999.  A  current-mode  sorting  circuit  for  pattern  recognition,  Proc.  Second 
International  Conference  on  Intelligent  Processing  and  Manufacturing  of  Materials  (IPMM’99), 
Honolulu,  Hawaii. 


694 


695 


A  Framework  for  Intelligent  Systems  based  on 
Vector  Annotated  Logic  Programs 

K.  Nakamatsu*,  Y.  Hasegawa*,  J.  Minoro  Abe**,  A.  Suzuki*** 

*  School  of  Humanity  of  Environment  Policy  and  Technology., 
Himeji  Institute  of  Technology,  Himeji,  Japan 
Email:  nakamatu@hept.himeii-tech.ac.jp:  ,  vumit@hept.himeii-tech.ac.jp 
**  Department  of  Informatics  (ICET),  Paulista  University,  Sao  Paulo,  Brazil 

Email:  imabe@Isi.usp.br 

***  Faculty  of  Information,  Shizuoka  University,  Hamamatsu,  Japan 


ABSTRACT 

This  paper  presents  a  framework  of  intelligent  reasoning  systems.  It  is  based  on  a  logic  programming 
system  called  VALPSN  (Vector  Annotated  Logic  Program  with  Strong  Negation)  and  its  stable  model 
computing  system.  We  introduce  an  overview  of  the  framework  and  describe  the  three  kinds  of 
nonmonotonic  theories,  default  theory,  defeasible  theory,  and  default  fuzzy  theory  that  can  be  translated 
into  VALPSNs.  We  also  show  that  these  three  kinds  of  nonmonotonic  reasoning  can  be  achieved  by 
computing  the  stable  models  of  the  VALPSNs. 

Keywords 

annotated  logic  program,  stable  model,  intelligent  system,  default  reasoning,  defeasible  reasoning, 
fuzzy  reasoning 


INTRODUCTION 

Annotated  logics  are  a  family  of  paraconsistent  logics  and  multi-valued  logics  that  are  appropriate  for 
dealing  with  inconsistency  or  conflicts[3].  Generally,  each  atomic  formula  of  annotated  logics  is  explicitly 
attached  a  truth  value  called  an  annotation.  For  example,  let  p  be  an  atomic  formula,  p  be  an  annotation, 
then,  p  :  p  is  an  annotated  atomic  formula.  There  are  two  kinds  of  negation  ,  an  epistemic  negation  and  an 
ontological  negation,  in  annotated  logics.  The  ontological  negation  is  a  strong  negation  and  we  have 
proposed  ALPSN  (Annotated  Logic  Program  with  Strong  Negation)  that  can  deal  with  nonmonotonic 
reasoning  in  [7,8]. 

In  this  paper,  we  introduce  a  new  extended  version  of  the  ALPSN  called  VALPSN  (Vector  Annotated 
Logic  Program  with  Strong  Negation)  in  order  to  formalize  some  intelligent  reasonings.  In  VALPSN,  an 
annotation  of  a  literal  is  a  2-dimensional  vector  such  that  the  first  and  second  components  of  the  vector 
indicate  the  amount  of  positive  and  negative  knowledge  in  terms  of  the  literal,  respectively,  and  the 
epistemic  negation  is  defined  as  the  exchange  of  each  component.  For  example,  let  q  be  a  literal  and  (2,  1) 
a  vector  annotation.  Then  ,  a  vector  annotated  literal  q  :  (2,  1)  is  intuitively  interpreted  as  "the  literal  q  is 
known  to  be  true  strength  2  and  false  strength  1",  and  —q  :  (2,  1)  =  q  :  (1, 2).  The  details  of  VALPSN  are 
formally  described  in  the  following  sections. 

We  have  shown  that  VALPSN  can  represent  three  kinds  of  reasonings,  defeasible  reasoning,  default 
reasoning,  and  fuzzy  reasoning  [6],  There  fore,  we  propose  a  framework  for  intelligent  reasoning  systems 
based  on  VALPSN  and  its  stable  model  computation,  which  can  deal  with  the  three  kinds  of  reasonings. 
We  introduce  the  overview  of  the  framework  in  the  following  section. 


OVERVIEW  OF  THE  FRAMEWORK 

The  framework  consists  of  two  modules,  the  knowledge  translation  module  and  the  inference  module,  as 
shown  in  Figure  1.  The  inference  engine  of  this  system  is  a  stable  model  computing  system.  In  the 


696 


framework,  it  is  assumed  that  all  input  knowledge  are  represented  in  the  three  kinds  of  theories,  defeasible 
theories,  default  theories,  and  fuzzy  theories.  Each  theory  is  translated  into  a  VALPSN  in  the  knowledge 
translation  module  and  the  model  of  each  theory  is  computed  as  the  stable  models  of  the  VALPSNs  in  the 
inference  module.  These  stable  models  represent  the  inference  results  of  the  three  nonmonotonic 
reasonings,  default  one,  defeasible  one,  and  default  fuzzy  one. 


Defeasible 

Theory 

Default 

Theory 

Fuzzy 

Theory 

1 

r  ^ 

-  ^ 

r 

V  A  L  P  S  N 

Stable  Model  Comnutina  System 


Translation 

Module 


Inference 

Module 


Fig.  1.  A  Framework  for  Intelligent  Systems. 

We  describe  the  details  of  the  translation  process  and  the  inference  process  by  presenting  some  examples 
after  introducing  VALPSN. 


VALPSN 

In  this  section,  we  introduce  ALPSN  and  its  extended  version,  VALPSN.  Generally,  the  set  of  truth  values 
T  in  annotated  logics  has  an  arbitrary  complete  lattice  structure.  ALPSN  has  a  well  known  the  4-valued 
lattice,  Lattice-4,  as  the  lattice  of  truth  values.  The  ordering  of  the  lattice  is  denoted  by  =  as  usual 


Fig.  2. 4-valued  Lattice  Structure. 

Let  A  be  a  lattice.  A  :  (X  is  called  an  annotated  literal  and  u.  is  called  an  annotation,  where  p  e  T. 
Generally,  annotated  literals  can  be  interpreted  epistemically.  For  example,  an  annotated  literal  p  :  t  is 
interpreted  as  p  is  known  to  be  true(t),  and  p  :  T  is  interpreted  as  p  is  known  to  be  both  true(t)  and  false(/). 
In  annotated  logics,  there  are  two  kinds  of  negations,  an  epistemic  negation(_’)  and  an  ontological  one(~). 
The  epistemic  negation  is  defined  as  a  mapping  from  annotations  to  annotations.  For  example,  =  J_ , 
“■(0  =/,  -’(/)  =  t,  ~i  T  )  =  T. 


Definition  1. 

The  (well  formed)  formulas  of  annotated  logics  are  defined  as  : 

1 .  Any  annotated  literal  is  a  formula. 

2.  If  F\ ,  F2  ,  and  F  are  formulas,  then,  F\  a  F2 ,  F\  v  F2 ,  F\  — »  F2  ,  VxF ,  and  3 xF  are  formulas. 

The  ontological  negation  ~  can  be  defined  by  the  epistemic  negation  This  ontological  negation  is  a 
strong  negation  that  has  all  properties  that  classical  negations  contain. 


697 


Definition  2.  (Strong  Negation)[3] 

Let  A  be  any  formula.  ~A  =  def  (A -+ ((A -> A)  a  -’(A  -4  A  ))). 

The  epistemic  negation  followed  by  (  A  -4  A  )  is  not  interpreted  as  a  mapping  between  annotations,  or 
rather ,  a  negation  in  the  sense  of  classical  logics. 

Definition  3.  [2,7] 

Let  L0 ,  ■  ,  L„  be  any  annotated  literals.  Lt  a  -  a  L„  —>  L0  is  called  an  annotated  clause(  a-clause),  and 
L,  a  •  Li  a  ~Ii+i  a  •  a  ~  L„  -4  Id  is  called  an  annotated  clause  with  strong  negation(asn-clause). An 
ALP  (Annotated  Logic  Program)  and  an  ALPSN  (Annotated  Logic  Program  with  Strong  Negation)  are 
finite  sets  of  a-clauses  and  asn-clauses,  respectively. 

We  now  address  the  semantics  for  ALPSN  and  assume  that  all  interpretations  have  a  Herbrand  base  Bp  as 
their  domain  of  interpretation.  Since  T  is  a  complete  lattice,  the  Herbrand  interpretation  /  of  an  ALPSN 
P  over  T  may  be  considered  to  be  a  mapping  I  :  Bp  -4  T.  Usually,  the  interpretation  I  is  denoted  by  the 
set  {(p  :  ?|i  ,•  )  |7  |  (p  :  p\)  a-  a  (p  :  p  „ )},  where  ?p  ,  is  used  to  denote  the  least  upper  bound  of  {p, , 

-  ,p  „  }  and  so  is  it  in  the  rest  of  this  paper.  We  assume  that  every  interpretation  of  annotated  logic 
programs  is  a  Herbrand  interpretation.  The  ordering  =  on  T  is  extended  to  interpretations  in  a  natural  way 
and  the  notion  of  satisfaction  is  defined. 

Definition  4.  [2] 

Let/i  and  I2  be  any  interpretations  and  A  be  an  annotated  atom.  f  =  I2  =  def  (VAe  Bp)(I,  (A))  =  I2 
(A)),  where  I\  ( A  ),  12  (A)  e  T.  An  interpretation  I  is  said  to  satisfy 

•  a  formula  F  iff  it  satisfies  every  closed  instance  of  F, 

•  a  ground  annotated  atom  (A  :  p)  iff  1(A)  =  p , 

•  a  ground  annotated  literal  (-A  :  p)  iff  1(A)  =  — i(p), 

•  a  complex  formula  -F  iff  I  does  not  satisfy  F. 

The  satisfaction  of  the  other  formulas,  F(  a  F2 ,  F\W  F2  ,  F\  — >  F2 ,  VxF  ,  and  3xF  ,  are  the  same  as 
classical  logics. 

The  satisfaction  is  denoted  by  the  usual  symbol  J  .  We  obtain  that,  for  any  formula  F,\  ~F  iff 
|?  F  from  the  above  definition. 

Definition  5. 

Associated  with  every  ALPSN  P,  a  function  Tp  from  Herbrand  interpretations  to  themselves  and  an 
upward  iteration  of  it.  For  any  ground  instance,  B\  a  •  a  Bm  a  ~Cj  a  -  a  ~C„  — »  (A  :  p),  of  an  asn- 
clause  in  an  ALPSN  P,  Tp(I)(A)  =  ?{p|/|  B\  a  -  aBw  a  ~Q  a  •  a  C„ } . 

Let  ?  be  a  special  interpretation  that  assigns  the  truth  value  _L  to  all  members  of  a  Herbrand  Base  Bp. 
Then,  the  upward  iteration  of  Tp  is  defined  as:  Tp'l  0  =  ? ,  Tp  T  A,  =  Ua<*.  Tp(Tp  t  a),  for  any  ordinals  a,  X. 

Then  the  following  propositions  hold  (The  proofs  are  in  [2]) :  If  a  program  P  is  an  ALP,  then  , 

•  Tp  is  a  monotonic  function, 

•  P  has  the  unique  least  model  that  is  identical  to  the  least  fixed  point  of  Tp, 

•  Tp  t  to  is  identical  to  the  least  fixed  point  of  Tp. 

We  describe  the  stable  model[4]  of  an  ALPSN  P  taking  account  into  propositions.  Let  I  be  any 
interpretation.  P1  ,  the  Gelfond  -  Lifschitz  transformation  of  P  with  respect  / ,  is  an  ALP  obtained  from  P 
by  deleting. 

(1)  each  clause  that  has  a  literal  ~  (C  :p  )  in  its  body  with  I  \  (C :  p  )  and 

(2)  all  strongly  negated  literals  in  the  bodies  of  the  remaining  clauses. 

Since  P1  has  no  strong  negation,  it  has  the  unique  least  model  that  is  given  by  Tp;  T  to  [2,4], 


698 


Definition  6.  (  Stable  Model  of  ALPSN)  [4,7,8] 

If  /  is  a  Herbrand  interpretation  of  an  ALPSN  P,  then  ,  /  is  called  a  stable  model  of  P  iff  I  =  Tpt  tea. 

Example  7. 

Let  an  ALPSN  P  =  {-  (b  :t)->(a  :t),~(a :  t)^>(b:  t),(a:  t)  ->(/>  :  t) ,  (b  :  t)  ->  (p  :/) }. 

If  /]  =  {( a  :  0,  (  b  :  ±),  (p  :  t) },  then,  Pu  =  {(a  :  t),  (a  :  t)  ->(p  :  t),  (b  :  t)  ->  (p  : /)}  and  Tp"  T  co  =/, 
Therefore,  A  is  a  stable  model  of  the  ALPSN  P.  If  I2  =  {(a  :  ±),(b  :t),(p: /)},then,  P12  =  {(  b  :  t),(a  :  t) 
->(p  :  t),(b  :  t)  ->  (p  : /)}  and  TP 12  t  to  =  /2.  Therefore,  /2  is  also  a  stable  model  of  the  ALPSN  P. 

The  primary  difference  between  ALPSN  and  VALPSN  is  in  their  annotations.  In  the  case  of  ALPSN,  the 
annotations  are  usually  some  symbols  expressing  some  meanings  such  as  ±(unknown),  fifalse),  tijrue), 
T (inconsistent).  On  the  other  hand,  in  the  case  of  VALPSN,  the  annotation  is  a  vector  called  &  vector 
annotation.  The  vector  annotation  is  a  2-dimentional  vector  (  /  ,  j  )  such  that  i  and  j  are  non-negative 
integers,  and  the  lattice  Tv  of  vector  annotations  is  defined  as  :  Tv  =  {(  i ,  j)|  0  =  /  =  n,  0  =j  =  n  ,  i ,  j  and 
n  are  non-negative  integers}.  Moreover,  we  define  the  ordering=  on  Tv.  Let  vt  =  (x\ty\ )  and  v2  =  ( x2  ,y2 
),  then  ,  V]  =  v2  iff  x\  =  x2  and  y  j  =  y2 ,  where,  x,  ,x2,y  i ,  and  y2  are  non-negative  integers.  Roughly 
speaking  ,  if  the  first  component  of  a  vector  annotation  can  be  regarded  as  representing  the  strength  of  truth 
while  the  second  component  represents  the  strength  of  falseness,  we  can  provide  epistemic  interpretations 
for  vector  annotated  literals  as  well  as  the  case  of  ordinary  annotated  literals.  If  p  is  a  literal  and  i ,  j  are 
non-negative  integers,/? :  ( j,  j )  is  interpreted  as  p  is  known  to  be  true  of  strength  i  and  false  of  strength  j. 
This  interpretation  provides  a  definition  of  the  epistemic  negation  in  vector  annotated  logics.  The  epistemic 
negation  can  be  defined  as  the  exchange  of  each  component  of  vector  annotations.  Let  (i ,  j  )  be  a  vector 
annotation.  Then ,  — 1(  i,  j  )  =  (j,  i ).  A  vector  annotated  literal  p  :  ( i  ,j)  may  be  regarded  as  implying  both 
a  conflict  and  defeasibility.  The  vector  annotation,  (  /,/'),  is  the  least  upper  bound  of  the  vector 
annotations,  (/ ,  0)  and  (0  ,j ),  and  it  can  be  regarded  as  containing  both  meanings,  “true  of  strength  i  “  and 
"false  of  strength  j".  Moreover,  if  the  integer  i  is  larger  than  the  integer  j,  then  the  literal  p  may  be 
interpreted  as  being  relatively  true,  in  the  sense  of  that  “true  of  strength  i  “defeats  "false  of  strength 
Therefore,  VALPSN  can  be  obtained  by  replacing  the  terms,  annotated  and  annotated  and  annotation,  in 
ALPSN  by  the  terms,  vector  annotated  and  vector  annotation,  respectively. 


DEFAULT  REASONING 

Generally,  it  is  well-known  that  logic  programs  with  strong  negation  can  formalize  a  default  reasoning. 
Actually,  we  have  already  shown  that  a  default  reasoning  can  be  translated  into  ALPSNs  in  [7,8]. 
Generally,  a  default  theory[9]  T  =  (£),  W)  consists  of  a  set  of  facts,  W,  which  are  closed  first  order 
formulas,  and  of  a  set  of  defaults,  D,  which  are  specific  inference  rules  having  the  form  u  :  v/w,  where  u,  v 
and  w  are  first  order  formulas.  However,  in  order  to  deal  with  the  default  theories  in  our  framework,  we 
restrict  IF  to  be  a  set  of  generalized  Horn  clauses(that  are  allowed  to  contain  negative  literals  in  their  heads 
or  bodies),  and  D,  to  be  a  set  of  defaults  having  the  following  form  :p}  a  •  a  pm:j\,-  ,j\  1C,  where  p\  ,- 
,  Pm  >  j\,  •  Jk  ,C  are  literals.  p\  a  ■  a  pm  is  the  prerequisite,  /,•  (  1  =  i  =  k)  the  justification  and  C  the 
consequent  of  the  default.  An  informal  interpretation  of  the  default  is  that  it  is  allowed  to  add  C  to  the 
current  knowledge  databases  whenever  p\  a  •  a  pm  belong  to  that  database  and  j\ ,  -  ,jk  are  consistent 
with  that  database(i.e.,  — i/)  ,  -  jk  do  not  belong  to  that  database)  If  T-  (£>,  W)  is  a  default  theory, 

(1)  for  any  de  D  such  that  d  =  pi  a  ■  Apm\jx,-  ,jk/C, 

Hd)  =(/>,  :  t)  a  -  a  ( pm  :  t )  a  (/} :  /  )  a  -  a  ~(jk  :  f  )^(C:t)  and 
tr(D)  =  {tr(d\ ), ...  fr(dn) }  such  that  di  e  D  (  1  =  i  =  n  ), 

(2)  for  any  we  W  such  that  w  =  ^,a-  a  A, ->A0, 
tr(w)  =(At  :  t)  a-  a  (A/  :  t)  —>  (A0  :  t )  and 

tr(W)  =  {tr{w\ ),  ...  (r(wk)}  such  that  wi  s  W(  1  =i  =  k). 

In  order  to  demonstrate  that  VALPSN  can  deal  with  default  reasoning  ,  we  propose  a  mapping  from  the  set 
T  of  annotations  into  the  set  Tv  of  vector  annotations.  Let  Tv  =  {(0,0), (0,1), (1,0), (1,1)}.  Then  the  mapping  : 
±-K0,0),/-»(0,l),  t  —>(1,0),  T— >(1,1).  We  take  the  example,  Pennsylvania-Dutch  ,  whish  has  been  used 
as  an  example  of  defeasible  theories  in  [1],  as  an  example  of  default  theories  in  this  section. 


699 


Example  8.  (Pennsylvania-Dutch)  [1] 

This  example  consists  of  one  fact,  FI,  two  default  rules,  Rl,  R2,  and  two  normal  rules,  R3,  R4,  and  can  be 
formalized  in  a  default  theory[8]. 

FI  :  Hans{/7}  is  a  native  speaker  of  Pennsylvania-Dutch  (PD),  {nspd(h)}. 

Rl  :  native  speakers  of  PD  are  usually  bom  in  Pennsylvania,  {nspd(h) :  bp(h)/bp(h)}. 

R2  :  people  bom  in  Pennsylvania  are  bom  in  the  USA,  {  bp(h)  — >  busa(h)}. 

R3  :  native  speakers  of  PD  are  native  speakers  of  German,  { nspd(h)  -»  nsg(h)}. 

R4  :  native  speakers  of  German  are  usually  not  bom  in  the  USA,  {i isg(h) :  -<busa(h)/-,busa(h)} . 


First,  the  above  default  theory  is  translated  into  an  ALPSN  in  the  way  as  described  in  [6],  and  next,  the 
ALPSN  is  translated  into  a  VALPSN  P  based  on  the  mapping  from  the  T  into  the  Tv  defined  above.  Then  , 
we  obtain  the  following  VALPSN  P. 


P  =  {nspd(h) :  (1,0),  mpd{h ) :  (1,0)  a  ~bp(h):(0, 1 )  -»  bp(h) :  (1,0), 
bp(h) :  (1,0)  — »  busa(h)(  1,0),  nspd(h):(  1,0)  — >  mg(h) :  (1,0), 
nspd(h) :  (1,0)a  ~nsg(h ) :  (1,0)  -^busa(h) :  (0,1)}. 

This  VALPSN  P  has  two  stable  models  I\  and  I2 . 

/,  =  {  nspd(h)  :  (1,0),  nsg(h)  :  (1,0),  bp(h):  (1,0),  busa(h):(\,0)}, 
h  =  {  nspd(h):  (1,0),  nsg(h):(  1,0),  bp(h):  (0,1),  busa(h):  (0,0) }. 

/,  indicates  that  Hans  is  known  to  be  bom  in  Pennsylvania  and  in  the  USA.  I2  indicates  that  Hans  is  known 
not  to  be  bom  in  Pennsylvania  and  it  is  unknown  whether  he  was  bom  in  the  USA  or  not. 

VALPSN  can  formalize  default  theories  and  the  stable  models  of  VALPSN  represent  the  results  of  the 
default  reasoning  as  shown  in  the  above  example.  The  example  is  cited  in  the  following  section  from  the 
viewpoint  of  defeasible  reasoning.  There  is  a  conflict  in  terms  of  the  question,  “  Where  was  Hans  bom  ?” 
between  the  stable  models,  /)  and  I2 .  We  consider  defeasible  reasoning  to  resolve  the  conflict  in  the 
following  section. 


DEFEASIBLE  REASONING 

We  have  shown  that  there  is  the  relation  between  the  defeasible  logic[l]  and  VALPSN  shown  in  Figure  3. 
Based  on  the  relation,  we  show  that  the  defeasible  theories  can  be  translated  into  VALPSNs  taking  the 
same  example,  Pennsylvania-Dutch. 

The  defeasible  logic[l]  contains  three  kinds  of  rules,  strict  rules,  defeasible  rales  and  defeaters.  Conflicts 
between  defeasible  rales  with  incompatible  cnseoquents  are  resolved  by  using  explicit  superior  relations  on 
rales.  The  defeasible  logic  is  defined  as  the  set  of  conditions  on  nodes  of  proof  trees.  The  alphabet  is  the 
union  of  the  following  four  pairwise  disjoint  sets  of  symbols. 

•  A  nonempty  countable  set  of  proposition  symbols. 

•  The  set  {— >,—»,=>,  =}  of  connectives. 

•  The  set  {+,- ,  ? ,?}  of  positive,  negative,  definite,  and  defeasible  proof  symbols. 

•  The  set  of  punctuation  marks  consisting  of  the  comma,  braces  and  parentheses. 


700 


translation 


Defeasuble 

Theory  c 

=>  VALPSN  P 

+lq 

definitely 

provable 

►  I\  q:(n,0) 

+lq 

defeasibly 

provable 

►  I\  q:(n- 1,0) 

I  is  a  stable  model  of  P 
Fig.  3.  Defeasible  Theory  and  VALPSN. 

The  negation  of  the  proposition  P  is  denoted  by  -p.  The  complement  of  the  proposition  p  is  —p  and  the 
complement  of  —p  is  p.  If  q  is  any  literal  then  the  complement  of  the  q  is  denoted  by  Q.  The  positive  proof 
symbol  +  indicates  that  the  following  literal  has  been  proved.  The  negative  proof  symbol  -  indicates  that 
the  following  literal  has  been  proved  to  be  unprovable.  The  definite  proof  symbol  ?  indicates  that  the  proof 
of  the  following  literal  cannot  be  defeated  by  more  information.  The  defeasible  proof  symbol  ?  indicates 
that  the  proof  of  the  following  literal  can  be  defeated  by  more  information.  A  rule  has  three  parts  :  a  finite 
set  of  literals  on  the  the  left ,  an  arrow  in  the  middle,  and  a  literal  on  the  right.  A  rule  which  contains  the 
strict  arrow  — >,  for  example  A  — >  q,  is  called  a  strict  rule.  The  intuition  is  that  whenever  all  the  literals  in  A 
are  accepted  then  q  must  be  accepted.  A  rule  which  contains  the  defeasible  arrow  =>,  for  example  A  =>  q  , 
is  called  a  defeasible  rule.  If  all  the  literals  in  A  are  accepted  then  q  is  accepted  provided  that  there  is  an 
insufficient  evidence  against  q.  A  rule  which  contains  the  defeater  arrow  ~,  for  example  A  =  Q,  is  called  a 
defeating  rule  or  a  defeater.  If  all  the  literals  in  A  are  accepted  then  A  — >  q  is  an  evidence  against  q,  but  not 
for  Q.  It  should  be  noted  that  the  antecedent  of  a  rule  can  be  empty  set. 

The  defeasible  logic  has  the  four  inference  conditions,  +?,  -?,  +?,  and  -?.  We  comment  about  the 
notations  in  the  conditions  before  describing  them.  Let  q  be  a  literal.  In  a  proof,  +lq  indicates  that  q  is 
proved  definitely,  -?  q  indicates  that  it  is  proved  that  q  can  not  be  proved  definitely,  +?q  indicates  that  q  is 
proved  defeasibly,  and  -?q  indicates  that  is  proved  that  q  can  not  be  proved  defeasibly.  Let  R  be  any  set  of 
rules.  The  set  of  strict  rules  in  R  denoted  by  Rs ,  and  the  union  of  Rs  and  the  set  of  defeasible  rules  in  R  by 
Rsd .  The  antecedent  of  any  rule  r  is  denoted  by  A  (r  )  and  its  consequent  is  denoted  by  C(r  ).  The  set  of 
consequents  of  rules  in  R  is  denoted  by  C(R  )  =  { C(r  )|  r  e  R  ).R\q]  =  def  {r  \  r  e  R  and  q  =  C(  r  )}.  The 
superiority  relation  on  R  is  any  symmetric  binary  relation  >  on  R  .  A  finite  sequence  P  =  (P(l),-  ,  P(\P  |)) 
of  tagged  literals(+?  q,  -?q,  +!q,  -Iq  )is  called  a  proof.  An  element  of  a  proof  is  called  a  line  of  the  proof. 
P(  i  +  1)  indicates  the  i  +  1th  line  of  a  proof.  P(\..i )  indicates  the  proof  lines  from  the  first  one  to  the  ith 
one.  The  four  conditions  of  inference  in  the  defeasible  logic  are  : 

+  ?  )  If  P(i  +  1  )  =  +lq,  for  some  literal  q,  then  either 
.1)  q  g  F;  or 

.2)  3 r  g  Rs  [g]V«  g  A(r ) ,  +?a  e  P( L.  i)  . 

-?  )  If  P  (  i  +  1 )  =  -?  q,  for  some  literal  q,  then 
.1)  q  i  F,  and 

.2)  Vr  g  /?j[<?]3a  eA(r),  -la  eP(l...i). 

+?)  If  P(  i  +  1)  =  +7q,  for  some  literal  q,  then  either 
.1)  +lq  g  P(l..i ) ;  or 

.2)  All  three  of  the  following  conditions  hold. 

.1)  3r  g  i?irf[^]VaG  A(r ),  +?a  g  P(1..i), 

.2)  -IQe  P(l.i),  and 
.3)  Vs  g  R[Q\  either 

.1)  3 a  g  A(s  ),  —la  g  P  (1  ..i  )  ;  or 
.2)  3 1  g  Rsd  [q\  such  that 

.1)  Va  e  A  (t ),  +!a  g  P(l..i ),  and 
.2)  t  >  s. 


701 


-?)  If  P(  i  +1  )  =  -?q,  for  some  literal  q,  then  either 
.1)  -Iq  e  P(l..i );or 
.2)  either 

.1)  Vr  G  R  sd[q]3a  G  A(r)-la  e  P(\..i ), 

•2 )+?0g  P(1..0, or 
.3)  3s  g  R[Q\  such  that 

.1)  Vo  g  A(s),  +?a  g  P(\..i) ;  and 
.2)  Vr  g  i?„/  \q]  either 

.1)  3 a  g  A(t),  -la  g  P(l../),  or 
.2)  not  (t  >  s ). 


Example  9. 

We  take  the  same  Pennsylvania-Dutch  in  Example  8  as  an  example  of  defeasible  theories  .  The 
VALPSN  in  Example  8  has  two  stable  models  and  there  is  a  conflict  when  the  question  “  Was  Hans  bom 
in  the  USA  ?”  is  asked.  The  generally  agreed  answer  for  the  question  is  “defeasibly  yes”.  In  order  to  obtain 
the  desirable  answer,  a  formalization  of  Pennsylvania-Dutch  by  the  defeasible  logic  contains  the 
superiority  relation  R1  >  R4.  The  details  of  the  formalization  are  found  in  [1]. 

FI  :  nspd(h),  R1  :  nspd(h)  =>  bp(h) ,  R2  :  bp(h)  — »  busa(h  ), 

R3  :  nspd(h)  —>  nsg(h),  R4  :  nsg(h)  =>  —ibusa(h). 

This  defeasible  theory  derives  +dbusa(h)  that  indicates  that  “  Hans  was  bom  in  the  USA  is  defeasibly 
provable,  and  it  is  translated  into  a  VALPSN  P.  Basically,  there  are  proof  procedures  on  both  cases  in 
which  conflicting  literals(g  and  -q)  are  derived  and  not  derived  in  the  defeasible  logic.  Therefore,  since  we 
have  to  take  into  account  the  both  cases  as  the  translation,  the  translation  rule  from  the  defeasible  theory 
into  a  VALPSN  is  complicated.  We  omit  the  formal  definition  and  the  details  of  the  rule  for  the  sake  of 
space  restriction.  Let  Tv  =  {(/ ,  j)  |  0  =  i  =  3,  0  =  j  =  3,  i ,  j  are  integers}.  Then  ,  we  can  formalize  the 
defeasible  theory,  Pennsylvania-Dutch,  in  VALPSN  based  on  the  translation. 

FI  is  translated  into  {nspd(h) :  (3,0)}. 

R1  is  translated  into  {nspd(h):  ( 2,0)  a  mg  (h) :  (2,0)  a  ~bp{h) :  (0,3)  bp(h) :  (2,0), 

nspd(h):  (2,0)  a  ~mg(h) :  (2,0)  a  -  bp(h) :  (0,3)  bp(h) :  (2,0)}. 

R2  is  translated  into  {bp(h) :  (3,0)  a  busa{h) :  (3,0),  bp(h) :  (2,0)  a  busa(h ) :  (2,0)}. 

R3  is  translated  into  {nspd(h) :  (3,0)— >n sg(h) :  (3,0),  nspd(h) :  (2,0)  — ¥  nsg(h)  :  (2,0)}. 

R4  is  translated  into  {  nsg(h) :  (2,0)  a  nspd(h) :  (2,0)  a  ~  busa(h) :  (3,0)  — >  busa(h) :  (0,1), 
nsg(h) :  (2,0)  a  ~mpd(h) :  (2,0)  a  ~busa(h) :  (3,0)  — >  busa(h )  :  (0,2)}. 

Then,  the  VALPSN  P  has  the  unique  stable  model, 

7  =  {  nspd(h) :  (3,0),  nsg(h) :  (3,0),  bp(h) :  (2,0),  busa(h) :  (2,1  )}. 

Since  busa(h) :  (2,  1)  implies  busa(h)  :  (2,0),  we  have  busa(h)  :  (2,0)  which  measns  that  “Hans  was  bom  in 
the  USA”  is  defeasibly  true  as  well  as  the  original  defeasible  theory. 


DEFAULT  FUZZY  REASONING 

VALPSN  can  deal  with  not  only  defeasible  reasoning  but  also  fuzzy  reasoning  .  In  order  to  implement 
default  fuzzy  reasoning  in  the  framework  of  VALPSN,  the  set  of  vector  annotations  is  required  to  be 
redenfmed  as  7}  =  {  (x,  y)  |  x,  y  g  [0,1]}.  The  first  component  x  indicates  the  degree  of  belief  and  the 
second  component  y  indicates  the  degree  of  disbelief.  Ifx  +y>  1,  then,  it  indicates  a  kind  of  conflict,  ifx  + 
y  <  1,  then  it  in  indicates  uncertainty  due  to  lack  of  information,  and  if  x  +  y=  1  then  it  indicates  normal 
belief!  10].  For  example  ,  p  :(  0.7,  0.3)  is  interpreted  informally  as  “  p  is  70%  believed  and  30% 
disbelieved”.  We  show  the  default  fuzzy  reasoning  based  on  VALPSN  by  using  the  following  modified 
Penguin-Triangle  as  an  example. 


702 


Example  10(modified  Penguin-Triangle) [9] 

This  example  consists  of  one  fact, 

FI:  “Tweety(t)  is  a  penguin”  is  100%  believed,  {p{t) :  (1,0,  0.0)}, 

two  fuzzy  rules, 

Rl:  If  “Tweety  is  a  penguin”is  100%  believed,  then  “Tweety  is  a  bird” is  80%  believed  and 
10%  disbelieved,  {p(t)  :  (1,0,  0.0  )  (0.8,  0.1)}, 

R2:  If  “Tweety  is  a  penguin”  is  100%  believed,  then  “Tweety  can  not  fly”  is  80%  believed  and 
20%  disbelieved,  {p(t ):  (1.0,  0.0)  ^-./(t ):  (0.8,  0.2)}, 

and  one  default  fuzzy  rule, 

R3:  If  “Tweety  is  a  bird”  is  more  than  80%  believed  and  “Tweety  can  not  fly”  is  not  more  than 
70%  believed,  then  “Tweety  can  fly”  is  70%  believed. 

{b{t) :  (0.8, 0.0)  a  -./(/) :  (0.7,  0.0)  t ):  (0.7, 0.0)}. 

Taking  the  Rl  rule  as  an  example  and  comparing  the  vector  annotation,  (1.0,  0.0),  of  the  antecedent  with 
one,  (0.8,  0.1),  of  the  consequents  of  Rl  from  the  viewpoint  of  knowledge  amount,  it  is  realized  that  the 
inference  by  Rl  reduces  the  amount  of  knowledge,  which  means  Rl  contains  uncertainty.  Intuitively,  it  can 
be  regarded  that  "a  penguin  is  a  bird"  is  not  1 00%  believed.  The  intuitive  reasoning  process  of  this  example 
is  as  follows:  from  FI  and  Rl,  we  have  {b(t)  :  (0.8,  0.1)},  from  FI  and  R2,  we  have  {-,/t  ):(0.8,  0.2)}, 
however,  we  can  not  have  consequent,  {/(/)  :  (0.7,  0.0)},since{-i/(/)  :  (0.8,  0.2)}  conflicts  with  {-  -/(? )  : 
(0.7,  0.0)}.  Then,  this  VALPSN  has  only  one  stable  model,  /  =  {  p(t  ):  (1.0, 0.0),  b(t )  :  (0.8, 0.1), /r)  : 
(0.2, 0.8)},  which  means  that  "Tweety  is  a  penguin"  is  100%  believed,  'Tweety  is  a  bird"  is  80%  believed 
and  10%  disbelieved,  however,  "Tweety  can  fly"“is  20%  believed  and  80%  disbelieved. 


CONCLUSION 

In  this  paper,  we  have  proposed  a  framework  for  intelligent  reasoning  systems  that  can  deal  with  three 
kinds  of  nonmonotonic  reasoning:  default,  defeasible,  and  fuzzy.  The  framework  consists  of  two  modules  , 
knowledge  translation  and  inferencing.  The  inference  module  is  a  stable  model  computing  system. 

We  have  implemented  the  two  modules  as  PROLOG  programs.  However,  there  are  some  problems  in  the 
implementation.  The  efficiency  of  the  stable  model  computing  system  is  not  good.  Generally,  it  takes  a 
long  time  to  compute  the  stable  models  of  VALPSNs.  Translation  from  defeasible  theories  into  VALPSNs 
is  so  complicated  that  it  also  takes  a  long  time  to  translate.  So  the  efficiency  of  these  speed  issues  should  be 
improved  in  future  work. 


REFERENCES 

1.  Billington,  D.,  1997.  "Conflicting  Literals  and  Defeasible  Logic",  Proc. 2nd  Australian  Workshop  on 
Commonsense  Reasoning ,  1-14. 

2.  Blair,  H.A.,  Subrahmanian,  V.S.,  1989.  "Paraconsistent  Logic  Programming",  Theoretical  Computer 
Science,  68,  135-154. 

3.  da  Costa,  N.C.A.,  Subrahmanian,  V.S.,  Vago,  C.,  1989.  "The  Paraconsistent  Logics  PT",  Zeitschrift  fur 
Mathematische  Logik  und  Grundlangen  der  Mathematik ,  37,  139-148. 

4.  Gelfond ,  M.,  Lifschits,  V.,  1988.  "The  Stable  Model  Semantics  for  Logic  Programming",  Proc.  5* 
Inter.  Conf.  on  Logic  Programming  ,  1070-1080. 

5.  Lloyd,  J.W.,  1987.  "Foundations  of  Logic  programming",  (2nd  edition),  Springer-Verlag. 

6.  Nakamatsu,  K.,  Abe,  J.M.,  1999.  "Reasonings  Based  on  Vector  Annotated  Logic  Programs",  Proc.  Inti 
Conf.  on  Computa.  Intelligence  for  Modeling,  Control,  and  Automation,  IOS  Press.,  Vienna,  Austria. 

7.  Nakamatsu,  K.,  Suzuki,  A.,  1994.  "Annotated  Semantics  for  Default  Reasoning",  Proc.  3rd  Pacific  Rim 
Inter.  Conf.  on  AI,  180-186. 

8.  Nakamatsu,  K.  and  Suzuki,  A.,  “A  Nonmonotonic  ATMS  Based  on  Annotated  Logic  Programs”,  in 
Agents  and  Multi-Agent  Systems  ,  LNAI  1441,  Springer-Verlag,  1998. 

9.  Reiter,  R.,  1980.  "A  Logic  for  Default  Reasoning",  Artificial  Intelligence,  13,  81-132. 

10. Turksen,  I.B.,  1986.  "Interval  Valued  Fuzzy  Sets  Based  on  Normal  Forms",  Fuzzy  Sets  and  Systems, 
20(2),  191-210. 


703 


A  Fuzzy  Logic  Assisted  Electrodynamic  Balance  for 
Unit  Operations  on  Single  Levitated  Particles 

M.  Pappalardo*,  A.  Pellegrino*,  M.  dAmore**,  P.  Giordano**  and  P.  Russo** 

*  Department  of  Mechanical  Engineering,  University  of  Salerno,  Fisciano,  Italy 
**  Department  of  Chemical  and  Food  Engineering,  University  of  Salerno,  Fisciano,  Italy 

**Email:  damore@dica.unisa.it 


ABSTRACT 

An  electrodynamic  balance  (EDB)  as  a  tool  for  unit  operations  on  single  sub-millimeter  particles  is 
described.  Fine  control  of  the  particle  position  is  designed  and  realized  using  either  fuzzy  logic  concepts  or 
traditional  PID  schemes.  Precision  and  efficacy  of  the  two  methods  are  compared.  A  simple  application  to 
the  drying  of  a  droplet  is  shown. 


INTRODUCTION 

Unit  operations  are  the  core  of  chemical  engineering.  Distillation,  evaporation,  absorption,  extraction, 
crystallization,  drying  are  of  a  basic  importance.  Howbeit,  these  operations  often  involve  bed  of  particles, 
whose  density  and  size  distribution  are  sometimes  difficult  even  to  estimate.  The  lack  of  experimental  data 
strongly  hinders  the  validation  of  mathematical  models  developed  with  the  aim  of  investigating  the  role  of 
operating  parameters  in  industrial  processes.  As  a  matter  of  fact,  a  number  of  plants  of  industrial  size  in  the 
field  of  particle  treatments  are  designed  basing  on  data  from  pilot  or  bench  scale  apparatuses. 

The  electrodynamic  balance  appears  to  be  an  intriguing  tool  for  analysis  of  momentum  and  mass  transfer 
phenomena  between  gas  and  particles  or  drops  in  crucial  conditions,  since  it  is  able  to  hold,  by  means  of  an 
electric  field,  a  single  submillimeter  particle  suspended  in  space  in  controlled  atmosphere  (  [1],  [2] ).  Some 
features  make  it  a  unique  apparatus:  i)  it  operates  on  a  single  particle,  so  problems  arising  from  interactions 
between  particles  are  inexistent;  ii)  as  the  boundary  layer  surrounding  one  particle  can  be  clearly  defined, 
heat  and  mass  transfer  phenomena  are  easy  to  approach;  iii)  very  fast  heating  via  a  power  laser,  or 
quenching,  are  possible  when  working  with  a  single  particle  due  to  the  very  high  surface  to  volume  ratio; 
iv)  measurements  of  particle  characteristics  and  their  evolution  during  the  run  (mass,  density,  surface  area, 
composition,  etc.)  are  in  principle  at  the  hand;  v)  discrimination  between  homogeneous  and  heterogeneous 
phenomena  is  allowed  for  when  considering  that  gaseous  products  quenched  in  a  cold  boundary  layer  are 
representative  of  what  happens  on  the  solid  surface;  vi)  direct  observation  of  what  is  going  on  makes 
optical  diagnostics  and  other  facilities  easy  to  apply  and  to  study.  d'Amore  et  al.  [3]  used  the  tool  to 
measure  the  apparent  density  of  individual  synthetic  char  particles  known  as  ’Spherocarb".  Dudek  et  al.  [4] 
performed  oxidation  rate  measurements  on  single  char  particles  at  various  temperatures.  Davis  and  Ray  [5] 
measured  the  evaporation  rates  of  single  droplets  of  dibutyl-sebacate.  Cohen  et  al.  [6]  obtained  pure 
crystallization  letting  single  levitated  droplets  of  a  salty  solution  evaporate  undisturbed. 

Nevertheless,  use  of  the  balance  requires  some  special  care  as  the  reference  point  for  any  measurement  is 
the  chamber  center  and,  in  turn,  the  particle  can  be  kept  stable,  levitated  in  the  center  only  by  a  continuous 
balancing  of  gravity,  electrical  and  aerodynamic  drag  forces.  If  the  forces  are  constant  or  slowly  changing, 
manual  adjustment  of  the  electrical  fields  are  enough  for  fairly  good  control.  Fast  changes  require,  of 
course,  automatic  devices.  In  the  past  [7],  position  control  systems  were  based  on  conventional  concepts 
and  offered  fast  but,  sometimes  not  quite  accurate,  controls.  It  must  be  outlined  that  fast  movement  of  the 
particle  may  produce  alteration  of  the  boundary  layer,  which  would  cause  misleading  results  if  delicate 
measurements  are  made,  as  for  instance,  drying  of  droplets,  drag,  even  combustion  studies. 

Whatever  the  technique  used  (either  photo-multiplier  or  photo-diodes  as  detecting  units,  and  proportional, 
integral,  derivative  or  a  combination  of  them  as  controller),  knowledge  of  the  physics  of  the  phenomenon  is 


0-7803-5489-3/99/$  10.00  ©1999  IEEE. 


704 


necessary.  In  fact,  as  the  motion  equations  of  a  particle  suspended  in  the  balance  are  highly  non  linear,  their 
handling  in  designing  a  conventional  position  control  may  be  quite  complicated.  Fuzzy  logic  appears  to 
offer  a  wise  way  to  overcome  this  difficulty  (  [8],  [9],  [10]  ).  Actually,  the  Fuzzy  strategy  does  not  require 
any  knowledge  of  the  mathematical  models  describing  the  process  to  be  controlled.  It  is,  in  fact,  based  on  a 
qualitative  description  of  the  phenomenon,  refined  step  by  step.  Very  often,  control  systems  based  on 
Fuzzy  logic  present  a  better  stability  than  PID  do,  with  an  increased  tolerance  in  the  control  variables  range. 
Independence  on  mathematical  models  and  flexibility  make  fuzzy  control  a  subject  of  an  increasing 
interest,  with  many  applications  in  engineering  from  NASA  space  controls  to  cameras  handling. 

In  this  work.  Fuzzy  Logic  is  applied  to  design  position  control  of  a  particle  levitated  in  anelectrodyamic 
balance,  with  the  aim  to  increase  the  potential  of  the  balance  in  the  field  of  chemical  engineering. 


THE  APPARATUS 

The  electrodynamic  chamber  consists  of  three  electrodes  in  an  hyperboloidal  configuration.  A  schematic 
view  of  the  balance  is  shown  in  Fig.  1. 


Fig.  1.  Schematic  of  the  electrodynamic  balance. 


The  chamber  creates  a  dynamic  electric  field  capable  of  suspending  a  single,  charged  particle.  The  AC  or 
ring  electrode  provides  lateral  stability  to  the  particle  through  an  imposed  AC  field  Vac  oscillating 
sinusoidally  +  3000  volts  at  variable  pulsation  £2.  The  DC  top  and  bottom  electrodes  provide  vertical 
stability  balancing  by  means  of  a  DC  field  Vdc  the  gravitational  and  drag  forces,  thus  stable  suspension  of 
the  charged  particle  at  the  chamber  center.  Changes  in  particle  mass  or  charge  are  counterbalanced  by 
proportional  changes  in  the  imposed  V<ic  to  keep  the  particle  centered.  Solid  particles  are  charged  in  the 
chamber  simply  injecting  them  by  a  syringe,  electrical  charges  coming  bytribo-electricity.  Single  droplets 
are  generated  via  a  piezoelectric  crystal  and  charged  by  an  induction  copper  ring  kept  at  400  volts 
(UNIPHOTON).  A  specially  designed  microscope,  a  video  camera,  and  an  image  processing  system 
purposely  setup  for  the  balance  allow  for  sizing  and  position  monitoring  of  the  levitated  particle.  Due  to 
the  particle  size,  which  is  in  the  order  of  the  tenth  of  microns,  a  large  magnification  is  required  (up  to 
2000X  on  the  PC  monitor).  The  particle  must  be  back-lit  to  have  enough  light  for  observation  by 
microscope.  The  algorithm  developed  for  the  image  analysis  takes  this  into  account.  A  vertical  strip  of  512 
pixels  is  considered.  Starting  from  the  top,  position  of  the  first  black  pixel  is  recorded.  After  having  verified 


705 


that  the  pixel  belongs  to  the  particle  by  checking  that  the  following  ones  are  black  too,  analysis  proceeds 
from  the  bottom  in  the  same  way  until  another  black  pixel  is  found  and  checked  [11],  The  particle  diameter 
is  evaluated  as  the  difference  between  the  two  black  pixels  in  the  strip.  Particle  center  comes  as  a 
consequence,  while  chamber  center  is  a  fixed  point  in  the  graphic  plan.  Information  are  then  sent  to  a 
virtual  controller  based  on  either  conventional  or  fuzzy  logic,  as  discussed  in  details  later. 

Mass  flow  controllers  and  special  electric  heaters  make  possible  to  operate  the  balance  in  controlled 
atmosphere.  An  Ar-ion  laser  (Coherent,  Innova  90,  4W  nominal  power)  coupled  to  an  optical  and 
electronic  group  (Dantec,  Flow  Velocity  Analyser)  allows  for  the  characterization  of  the  gas  flow  field 
inside  the  chamber. 


PARTICLE  STABILITY 

The  balance  works  as  follows.  A  local  minimum  in  the  energy  field  has  to  be  generated  to  keep  an 
electrically  charged  particle  in  stable  equilibrium.  In  a  real  system,  a  saddle  point  can  be  generated  at  most. 
If  the  direction  of  the  field  lines  is  continuously  and  rapidly  reversed,  then  the  particle  can  be  kept  in 
proximity  to  the  saddle  point.  A  stability  analysis  will  then  give  the  conditions  the  system  parameters  will 
have  to  satisfy  to  let  a  particle  stay  levitated  in  the  balance. 

The  theory  is  fully  described  by  Wuerker  et  al.  [1],  Frickel  et  al.  [2],  Davis  and  Ray  [5],  The  electric  field 
produced  by  Vdc  is: 

Edc=^k  1. 

zo 

where  k  is  the  unit  vector  of  the  vertical  direction.  The  electric  field  generated  by  the  ring  electrode  with  a 
v(t)=vac  cos(to t)  applied  is: 

Er=-Vac-!jCOs{m)r  2. 

zo 


E-  =  2Vac  —cos{m)k  4. 

zo 

where  Zo  is  the  distance  of  the  endcap  electrode  from  the  geometrical  center  of  the  chamber  and  r  ,  <]>  and  z 
the  polar  coordinates. 

The  continuous  field  Edc  allows  for  balancing  the  gravity  force  (qEdc=mg),  whereas  the  Eac  field  generates  a 
saddle  point  in  the  chamber  center.  Intensity  and  sign  of  the  related  field  lines  vary  with  time  according  to  a 
synusoidal  law.  Since  the  saddle  point  above  is  not  a  point  of  stable  equilibrium,  the  forces  acting  on  the 
particle,  and  consequently  all  the  related  parameters  (i.e.  Vac,  to,  Vdc,  q,  m)  cannot  arbitrarily  vary  without 
affecting  the  particle  equilibrium  if  the  only  constraint  were 

V*=z o— •  5. 

It  is  thus  necessary  to  identify  all  the  constraints  the  parameters  above  must  respect  to  guarantee  stable 
equilibrium  conditions. 

If  the  differential  equations  of  the  motion  of  a  charged  particle  in  the  chamber  are  considered  [5],  then: 


(1)  a  charged  particle  can  be  stably  kept  in  the  balance  if  the  following  conditions  are  satisfied: 

4V  a 

0  <q=  <  0.908 

(O2z0m 

8ji+£]r  £i'| 

*0>  2  2  ^ 

CO  qz  Z0m 


706 


(2)  in  a  stability  region  a  charged  particle  is  alwaysradially  centered. 

Relations  above  are  actually  more  useful  to  our  purposes  if  transformed  as  follows: 


v  >  16.55 


I  v% 


where 


v  < 


_L  1 

4905 

'Vao' 

2  65.6 

"200 

2tc  ]j 

P2 

D\im 

Y 


V 


(0 

2jt 


8. 

9. 

10. 


These  latter  give  the  range  of  the  frequency  values  at  which  the  particle  is  stably  kept  in  the  chamber 
center,  as  a  function  of  either  measurable  or  known  variables.  It  has  to  be  noted  that,  since  the  relations 
above  are  inequalities,  the  number  of  degrees  of  freedom  of  the  system  does  not  decrease. 

Howbeit,  solving  for  the  stability  equations  allows  for  the  particle  levitation,  but  does  not  guarantee  that  the 
particle  will  occupy  the  center  of  the  chamber.  On  the  other  hand,  the  chamber  center  is  the  only  point  of 
the  field  where  the  equilibrium  of  all  the  forces  acting  is  satisfied  and  the  balance  equations  can  be  solved. 
It  is  now  evident  the  need  for  a  particle  position  control  system. 


POSITIONING  CONTROL 

Aim  of  the  control  it  is  to  determine  the  Vdc  equilibrium  value  necessary  to  bring  the  particle  in  the  stability 
position.  The  distance  h  of  the  particle  from  the  center  has  been  selected  as  the  measured  variable,  while 
Vdc  is  the  manipulated  variable  used  to  keep  the  particle  in  the  center  by  counterbalancing,  with  the 
electrical  force  any  change  in  the  other  forces  possibly  acting.  Both  the  image  analyzer  and  the  AD/DA 
interface  computer/equipment  have  been  purposely  chosen  as  they  work  in  the  same  programming 
language.  It  has  thus  been  possible  to  write  a  single  algorithm  for  sizing  and  positioning  of  the  particle, 
evaluating  the  stability  parameters,  and  control  the  particle  position.  A  virtual  digital  controller  has  been 
adopted  via  Pascal  programming.  In  the  following  are  described  the  two  different  kind  of  control  realized 
for  operating  the  balance:  i.  e.  a  classic  PID,  as  a  reference  tool,  and  a  Fuzzy-Logic  based  control. 


PID  control 

A  proportional  integral  derivative  control  (PID)  in  velocity  form,  which  is  particularly  proper  with 
unknown  or  variable  set-points  [12],  has  been  designed  for  the  system.  Being  the  controller  a  non 
continuous  one  (sampling  rate  30  s'1),  the  system  stability  is  a  function  of  K,  T/I,  D/T,  where  T  is  the 
sampling  time,  and  K,  I  D  are  the  time  constants  of  the  proportional,  integral  and  derivative  controller, 
respectively.  The  last  three  particle  positions  are  required  for  this  controller. 


By  a  force  balance  along  the  vertical  axis,  it  is: 


i  dz  ,  o dz 

m  — —  =  -3im  D  — 
dt2  dt 


1  Vdc  2Vm 
■m  g  +  q-^-  +  q — ^ 

zo 


s(cot) 


11. 


where  z1  and  m1  are  the  deviation  variables  for  particles  position  and  mass,  respectively.  Thus,  for  a  given 
particle  there  will  be  a  set  of  values  of  K,  T/I,  D/T  which  can  control  the  position,  by  adjusting  the  Vdc  to 
the  equilibrium  value.  The  control  law  is: 


AVdc=K*((l+T/I+D/T)*(Ai)-(l+2*D/T)*(Bi)+(Ci)*D/T))*Diam  12. 

where  Ai,  Bi,  Ci  are  the  last  three  positions  and  Diam  is  the  particle  diameter. 

The  system  equations  of  the  acting  forces  are  at  variable  coefficients  and  thus  difficult  to  handle.  The 
values  of  the  three  parameters  have  therefore  to  be  evaluated  experimentally.  However,  the  system  is  highly 


707 


non  linear  and  the  PID  controller  is  thus  not  easily  optimized.  It  will  not  be  able  to  perfectly  control  the 
system  in  a  constant  way  as  the  disturb  varies. 

Adoption  of  Fuzzy  Control  has  been  actually  stimulated  by  the  non  linearity  of  the  system  and  by  the 
perspective  of  improving  particle  position  control  in  the  presence  of  changes  in  the  force  field. 

The  FUZZY  controller 

Designing  a  Fuzzy  control  requires  4  well-defined  steps: 

i)  verbally  modeling  the  system  to  be  controlled 

ii)  identifying  the  control  laws  which  relate  DC  variations  to  particle  position 

iii)  formulating  the  fuzzy  rules  which  constitute  the  control 

iv)  validating  the  fuzzy  controller 

We  define  an  algorithm  which  exhaustively  enumerates  all  rules  consistent  with  the  qualitative  model, 
having  defined  a  rule  as  a  structure: 


‘If This...  therAVdc is  ...  ”  13. 

where  each  Th  is  a  test  on  the  observable  parameter  p  and  Pj_i  (testing  either  Pj>k  or  Ps<k  with  Pi-Pi-i>k'  or 
Pi-Pn<k\  where  k  and  k1  are  constant).  Differently  from  above,  the  two  last  particle  position  (P  and  Pm) 
are  enough  for  controlling  the  particle. 

29  rules  (Tj)  are  determined;  the  tests  P;>k  (0  is  equilibrium  position)  and  PrPi-i>k'  are  identified  by  the 
adjective  greater  than,  much  greater  than,  less  than,  much  less  than.  Basing  on  this  approach  and  on  trivial 
physical  considerations,  AVdc  is  given  for  each  of  the  above  rules. 

An  example  of  the  Turbo-Pascal  algorithm  is: 

if  Ai  >  (gy) 
then  begin 

if(Bi-Ai)  >  (hy) 
then  AVdc:=-my 
else  if  (Bi-Ai)  >=  0 
then  AVdc:=-ny 

else  if  ((Bi-Ai)  <  0)  and  (Bi>=  0) 
then  AVdc:=-ty 
else  ifBi<0 
then  A  Vdc:--ry 

end 

else . 

where  gy,  hy  are  distances  while  my,  ny,  ty,  ry,  are  positive  AVdc. 

The  AVdc  values  are  then  optimized  according  to  experimental  results. 


RESULTS 

The  effectiveness  of  both  the  position  control  realized  has  been  tested  by  inducing  variations  in  the  drag 
forces  acting  on  levitated  glass  spheres  <  200  micron.  The  flow  rate  of  the  gas  entering  the  balance  through 
the  bottom  electrode  was  varied  via  a  computer-driven  flow  controller  so  as  to  have  linear  variations  in  the 
gas-to-particle  slip  velocity. 

Figures  2(a)  and  2(b)  show  the  deviation  from  the  center  position  as  a  function  of  time  for  a  1 10  pm  glass 
sphere  undergoing  a  variable  drag  force,  and  controlled  by  a  PID  and  a  Fuzzy  system,  respectively. 


708 


Fig.  2.  Position  of  particle  center  as  a  function  of  time  with  continuous  change  in  the  system  of  forces. 

Particle  diameter  =110  pm. 


It  can  be  seen  that  particle  movement  is  limited  to  the  order  of  ±  2  pm  at  most,  i.e.  1%  of  the  particle 
diameter,  when  fuzzy  control  is  used.  The  P1D  controller,  in  spite  of  good  control  in  the  first  ten  seconds  of 
the  run,  seems  unable  to  keep  the  particle  in  the  center  as  the  drag  force  increases.  The  particle  is  kept  some 
microns  far  from  the  chamber  center,  then  begins  to  oscillate  by  ±  6  pm  around  the  center.  Note  that  in  the 
absence  of  any  position  control,  the  particle  is  lost  within  the  first  second  of  applying  the  drag  force. 

From  the  comparison  it  appears  that  fuzzy  control  more  rapidly  restores  particle  equilibrium  and  better 
reacts  to  system  instabilities.  In  Figures  3(a)  and  3(b),  the  results  obtained  with  the  two  controls  on  a  glass 
sphere  of  60  microns,  are  compared,  with  the  drag  force  increased  according  to  a  power  law,  to  balance  the 
particle  weight. 


Fig.  3.  Position  of  particle  center  as  a  function  of  time  with  a  continuous  change  in  the  system  of  forces. 

Particle  diameter  =  60  pm. 


The  Fuzzy  controller  gives  once  again  a  better  answer.  In  fact,  it  rapidly  brings  the  particle  back  to  the 
equilibrium  position  and  compensates  well  for  changes  in  the  external  forces  without  showing  the 
oscillations  evidenced  by  the  PID  controller  in  Fig.  3(a).  Nevertheless,  a  certain  difficulty  appears  when  the 
imposed  drag  force  is  very  high  and  the  equilibrium  is  reached  with  increased  difficulty. 


709 


It  is  noteworthy  that  Fuzzy  Control  shows  better  behavior  even  at  the  beginning  of  the  run  when  the 
particle  undergoes  a  sort  of  shock.  Particle  position  deviations  from  the  center  in  the  case  of  PID  control 
are  significant.  The  amplitude  and  kind  of  deviations  seem  to  be  independent  of  particle  size.  For  Fuzzy 
control,  some  minor  deviations  appear  at  high  values  of  the  imposed  changes  in  the  drag  force.  As  an 
example  of  the  efficacy  of  the  control  realized,  Fig.  4  shows  the  evolution  of  the  results  of  an  evaporation 
test  performed  at  room  temperature  on  a  milk  droplet  levitated  in  the  balance  using  Fuzzy  position  control. 

In  the  Figure  the  droplet  diameter  is  reported  as  a  function  of  time.  Some  interesting  features  are  shown. 
The  smoothness  of  the  drying  curve  is  a  direct  witness  of  the  quality  of  the  fuzzy  control  realized,  as  it 
means  that  the  particle  is  quietly  drying  in  the  chamber  center.  Then,  it  is  definitely  intriguing  a  so  well 
defined  inflection  point.  The  curve  is  actually  of  a  great  help  in  modeling  the  drying  of  milk  . 


Fig.  4.  Diameter  of  an  evaporating  milk  droplet  as  a  function  of  time 


CONCLUSIONS 

The  Fuzzy  Logic-based  position  control  realized  appears  to  offer  remarkable  advantages  with  respect  to 

traditional  systems,  as  it  can  be  set  up  without  knowing  in  the  deepest  the  phenomenon  to  be  controlled. 

Moreover,  the  control  appears  to  be  more  rapid  and  accurate.  The  balance  in  this  arrangement  is  suitable  for 

fundamental  studies  on  mass  transfer  on  single  solid  or  liquid  particles,  like  for  instance  spray-drying,  or  on 

measurements  of  the  aerodynamic  drag  forces  acting  on  solid  particles  in  both  cold  or  hot  gases. 

REFERENCES 

1.  R.F.  Wuerker,  H.  Shelton,  R.V.  Langmuir,  1959.  Electrodynamic  Containment  of  Charged  Particles.  .J. 
Appl.  Phys.30,342 

2.  R.H.  Frickel,  R.E.  Shaffer,  J.B.  Stamatoff,  1978.  Techn  Rep..  ARCEL-TR77041,  U.S.,  Command, 
Aberdeen,  Maryland. 

3.  M.  dAmore,  R.D.  Dudek,  A.F.  Sarofim,  J.P.  Longwell,  1988.  Apparent  Particle  Density  of  a  Fine 
Particle.  Powder  Technology,  129-134. 

4.  E.  Bar-Ziv,  D.  Jones,  R.  Spjut,  D.  Dudek,  A.  Sarofim,  J.  Longwell,  1989.  Combust.  &  Flame,  75-81. 

5.  E.J.  Davis,  A.K.,  Ray,  1980.  Single  aerosol  particle  size  and  mass  measurements  using  an 
electrodynamic  balance.  J.  Colloid  Interface  Sci.,  566-576. 

6.  M.D.  Cohen,  R.C.  Flagan,  J.H.  Seinfeld,  1987.  Studies  of  Concentrated  Electrolyte  Solutions  Using  the 
Elecxtrodynamic  Balance.  3.  Solute  Nucleation.  J.  Phys.  Chem.  91, 4583-4590. 

7.  R.E.  Spjut,  E.  Bar-Ziv,  A.F.  Sarofim,  J.P.  Longwell,  1986.  Electrodynamic  Thermogravimetric 
Analyzer.  Rev.  Sci.  Instrum.  57,  1604 

8.  L.  A.  Zadeh,  1965.  Fuzzy  sets.  Information  and  Control,  8,  338-353. 

9.  L.  A.  Zadeh,  1992.  The  calculus  of  fuzzy  if-then  rules.  AI  Expert.,  March 

10.  B.  Kosko,1992.  Neural  Network  and  Fuzzy  Systems.  Prentice  Hall  Inter.  Ed.,  Englewood  CliffsN.J.. 

1 1.  M.  dAmore,  P.  Giordano,  P.  Russo,  1998.  Mass  transfer  measurements  from  single  levitated  droplets  in 
electrodynamic  balance,  Proc  of  SIMAI IV,  Catania,  Italy,  June  1998,  II,  308-315 

12.  G.,  Stephanopoulos  ,1984.  Chemical  process  control.  Prentice  Hall  International. 


710 


1-1 


AUTHOR’S  INDEX 


M.F.  Abbod 

215 

N.  Chen 

1381 

S.M.  Adballah 

1017 

N.  Chen 

1419 

S.  Ahn 

1047 

R.  Chen 

1419 

J.M.  Abe 

695 

Y.-M.  Chen 

1061 

A.  Akhtar 

417 

D.  Cheung 

1079 

J.  Ahola 

531 

C.-C.  Chiang 

1131 

N.  Aikawa 

607 

D  J.  Choo 

947 

J.D.  Allen,  Jr. 

961,989 

C.Y.  Chung 

1023 

K.  Ameyama 

1041 

S.  Cierpisz 

933 

W.  Andreoni 

1397 

D.J.  Clancy 

871 

A.  Arioti 

629 

E.J.  Colville 

649 

J.F.  Atkinson 

347 

J.A.  Cooper 

191 

C.S.  Cornelius 

325 

G.  Baiden 

53 

L.  Cser 

531 

T  J.  Bailey 

921 

C.  Curtis 

317 

J.  Balcita 

499 

P.  Barr 

111 

J.  Daams 

1339 

R.  Barton 

269 

M.  d'Amore 

703 

O.A.  Bascur 

829 

D.  Dasgupta 

257 

D.  Bassi 

975 

W.J.  Davis 

615 

M.  Benedict 

1185 

L.R.P.  De  Andrade  Lima 

505 

R.R.  Biggers 

1258, 1317 

D.V.  Dempsey 

1258,  1317 

Y.  Bissiri 

635 

S.  Dessureault 

145 

H.  Bode 

339 

R.J.  Dippenaar 

75 

G.  Bonifazi 

465,  485 

S.  Dolinsek 

847 

B.M.  Brasfield 

347 

A.  Donnarumma 

185, 663 

J.C.  Bressiani 

797 

R.  Doraiswami 

735 

R.T.  Bui 

749 

B.F.  Duan 

1361 

J.D.  Busbee 

1258, 1317 

M.  Duarte 

975 

K.  Dudek 

543 

T.L.  Calton 

347 

S.  Dunbar 

145, 635 

J.J.  Campbell 

939 

M.N.  Durakbasa 

927 

N.  Cappetti 

185 

M.J.  Cardew-Flall 

1017 

S.A.  Ehikioya 

139 

L.-E.  Carlsson 

459 

H.  Eldeib 

447 

J.C.  Cassa 

291,381 

J.  Endou 

817 

0.  Castillo 

151, 855 

J.R.  Esslinger 

331 

A.C.D.  Chaklader 

797 

R.N.  Evans 

331 

T.  Chandra 

105 

T.  Chashikawa 

453 

M.  Fabiunke 

655 

C.C.  Chang 

789 

C.  Fantozzi 

629 

J.Y.  Chen 

901 

N.  Farmer 

879 

L.S.  Chen 

805 

P.  Farrington 

157 

M.Y.  Chen 

395 

M.  Fathi-Torbaghan 

1011 

1-2 


F.  Ferguson 

317 

R.  Felix 

299 

M.  Ferry 

105 

G.  Floridia 

291,  381 

G.A.  Fodor 

895 

S.  Forouzi 

967 

S.  Forrest 

257 

J.M.  Fragomeni 

577,  585 

W.G.  Frazier 

1139 

K.  Fujii 

453 

S.  Fuks 

1123 

M.  Furukawa 

1115 

Y.  Furukawa 

21 

Y.  Fukuhara 

743 

M.  Geiger 

641 

L.M  Geng 

91 

R.  Gerth 

1151 

D.T.  Gethin 

513, 1035 

M.M.  Ghomshei 

519 

H.  Ghulman 

1151 

D.A.  Gibson 

325 

P.  Giordano 

703 

Z.  Gomolka 

813 

G.D.  Gonzalez 

59 

M.  Granchi 

629 

J.L.  Grantner 

895 

R.W.  Grimes 

1197 

W.A.  Gruver 

839 

A.  Grzech 

823 

C.  Guist 

339 

S.R.  Gunn 

361 

M  J.  Guo 

779 

Y.M.  Guo 

861 

M.  Gupta 

119 

A.  Hambaba 

1073 

S.  Flanada 

429 

R.D.  Harrell 

157, 347 

J.  Hart 

879 

Y.  Hasegawa 

695 

J.  Hatonen 

459 

H.  Helman 

537,  549,  561 

L.  Hildebrand 

1011 

T.  Hirasawa 

221 

T.  Hirasawa 

245 

S.  Hirose 

233 

B.  Hlavacek 

403 

C.T.T.  Ho 

1061 

P.D.  Hodgson 

389,  953 

D.A.  Holder 

157 

D.A.  Holder 

347 

R.-Q.  Hsu 

1029 

C.-C.  Hu 

1131 

H.M.  Huang 

285 

I.B.  Huang 

423 

W.  Huang 

1277 

P.  Hubik 

437 

G.  Huh 

947 

Y.S.  Hwang 

879 

H.  Hyotyniemi 

11,  179, 459 

B.  Igelnik 

367 

K.  Ishida 

373 

K.  Ishino 

1093 

N.  Ivezic 

961,989 

S.  Iwata 

1323,  1399 

A.G. Jackson 

1185 

J.  Jang 

947 

H-G.  Jeong 

429 

P.D.  Jero 

1241 

L  Jin 

805 

J.G.  Jones 

1241,  1258,  1317 

H.K.  Jung 

593 

K.D.  Jung 

593 

B.  Kadar 

131 

R.  Kainuma 

373 

K.  Kamitani 

565 

J.S.  Kandola 

361 

C.G.  Kang 

593 

S.  Kang 

1047 

A.  Karcher 

623 

A.  Katayama 

607 

M.  Kato 

373 

K.Katoh 

571 

Y.  Kawazoe 

..355 

A.R.  Khoei 

513, 1035 

H.S.  Kim 

429 

J.  Kim 

1263 

N.N.  Kiselyova 

1387 

1-3 


1-4 


E.P.  Paladini 

165 

R.  Pakalnis 

983 

Y.-H.  Pao 

367,  1361 

M.  Pappalardo 

185,  663,703 

Z.  Pawlak 

37 

G.H.  Park 

1285 

G.R.  Park 

1263 

J.  H.  Park 

1285 

S.H.  Park 

947 

P.E.  Parker 

895 

C.D.M.  Pataro 

549 

A.  Pellegrino 

703 

J.  Perron 

749 

M.  Pietrzyk 

773 

J.  Poindexter 

1163 

P.  Qin 

1419 

M.V.  Quintella-Cury 

1123 

C.  Reidsema 

1055 

S.  Reimann 

1103 

S.  Raj an 

735 

P.A.S.  Reed 

361 

D.A.  Ress 

1225 

D.  Rochoviak 

157 

J.  Rogers 

157 

B.F.  Rolfe 

1017 

J.A.  Romagnoli 

947 

D.  Roy 

281 

P.  Russo 

703 

D.  Russell 

157 

Y.  Sahai 

119 

T.  Saito 

723 

Y.  Saito 

555 

Y.  Sakamoto 

221,  245 

G.  Samadi  Elosseinali 

417 

I.V.  Samarasekera 

73 

S.  Sandig 

251 

E.  Santoro 

185 

M.  Sato-Ilic 

207 

M.  Scoble 

145 

H.-J.  Sebastian 

..163 

S.  Serranti 

465 

J.  Sestak 

403,  431,437 

V.Sestakova 

437 

B.  Shi 

687,  1003 

Y.  Shigaki 

561 

S.  Shima 

221,  245,563 

F.  Shimaya 

555 

M.  Shinkawa 

555 

S.S.  Shivathaya 

105 

G.R.  Shumaker 

1163 

O.  Simula 

531 

I.  Sinclair 

361 

A.K.  Siren 

477 

M.H.  Smith 

839 

G.W.  Snyder 

331 

S.J.  Spencer 

939 

B.  Stepanek 

437 

O.  Stephansson 

471 

M.  Stevenson 

735 

A.R.  Souza 

291,  381 

Z.  Stmad 

431 

B.A.  Stucke 

1163 

A.  Suarez 

975 

Y.  Suehiro 

1041 

S.  Sugiyama 

909 

A.  Suzuki 

695 

H.  Szczerbicka 

269 

3  E.  Szczerbicki 

813,  1055 

M.  Tabib-Azar 

367,  1249 

Y.  Takefuji 

1109,  1271,  1277 

453,  723,  743,  995, 

A.  Talaie 

947 

J.  Tenner 

921 

L.  Tikasz 

749 

K.M.  Tiwari 

281 

S.K.  Tiwari 

281 

A.  Torres 

309 

V.  Torres 

309 

T.T.  Tran 

139 

D.  Tromans 

411 

E.G.  Truelove 

139 

U.-B.  Tsai 

1029 

H.-L.  Tsoi 

1079 

I.B.  Turksen 

173 

O.  Unold 

277 

T.  Ushio 

299 

H.  Utsunomiya 

555 

1-5 


T.  Van  Le 

675 

M.M.  Veiga 

797 

S.M.B.Veiga 

797 

E.  Vettori 

629 

N.K.  Vidyarthi 

281 

Z.J.  Viharos 

847 

P.  Villars 

1339 

P.  Villars 

1399 

R.C.  Villas  Boas 

481,505 

F.  Volpe 

465 

R.H.  Wagoner 

91 

B.  Wang 

953 

D.  Wang 

805 

W.X.  Wang 

471 

Z.  Wang 

1067 

M.  Watanabe 

1115 

K.R.  Weller 

939 

G.A.W.  West 

1017 

P.  Wiesner 

251 

M.  Wiliams 

785 

J.  Wirtz 

623 

C.-W.  Wu 

1029 

LXie 

1067 

Y.  Xu 

805 

T.  Yamamoto 

221 

M.  Yamaura 

233 

M.  Yang 

607 

Y.Y.  Yang 

755 

S.K.  Yen 

423,  779,  789 

R.  Ylinen 

11 

A.  Yoshida 

221 

S.  Yoshihara 

571,601 

N.  Yoshiike 

1109 

G.T.  Yu 

785 

W.S.  Yu 

1125 

M.O.  Zacate 

1197 

L.A.  Zadeh 

3 

T.  Zacharia 

961,989 

H.Z.  Zan 

779 

L.E.  Zarate 

537 

B.L.Zhang 

867 

T.  Zhang 

839 

X.  Zhang 

887 

Y.L.  Zhao 

1361 

B.S.  Zhu 

867 

D.D.  Zhu 

1381 

H.J.  Zimmermann 

45 

R.  Zuco 

465 

1-6 


